Compare commits

...

609 Commits

Author SHA1 Message Date
Philipp Hagemeister
8abbf43f21 release 2013.10.15 2013-10-15 12:06:45 +02:00
Philipp Hagemeister
10eaae48ff Merge branch 'master' of github.com:rg3/youtube-dl 2013-10-15 12:05:24 +02:00
Philipp Hagemeister
9d4660cab1 [generic] Support embedded vimeo videos (#1602) 2013-10-15 12:05:13 +02:00
Jaime Marquínez Ferrándiz
9d74e308f7 [sztvhu] Fix the title extraction 2013-10-15 08:22:59 +02:00
Jaime Marquínez Ferrándiz
e772692ffd Fix an import in the tests and the Youtube Shows test 2013-10-15 08:22:20 +02:00
Jaime Marquínez Ferrándiz
8381a92120 [websurg] Skipt the test
It needs login information.
2013-10-15 08:12:30 +02:00
Philipp Hagemeister
cd054fc491 Use upper-case for prefixes in help to signify bytes (#1043) 2013-10-15 04:53:02 +02:00
Philipp Hagemeister
f219743e33 Merge remote-tracking branch 'alphapapa/master' 2013-10-15 04:52:07 +02:00
Philipp Hagemeister
4f41664de8 Merge remote-tracking branch 'Rudloff/websurg' 2013-10-15 02:11:33 +02:00
Philipp Hagemeister
a4fd04158e Do not import * 2013-10-15 02:07:26 +02:00
Philipp Hagemeister
44a5f1718a Simplify tests
* Make them directly executable again
* Move common stuff (md5, parameters) to helper
* Never import *
* General clean up
2013-10-15 02:00:55 +02:00
Philipp Hagemeister
a623df4c7b Credit @Elbandi for sztvhu 2013-10-15 01:34:47 +02:00
Philipp Hagemeister
7cf67fbe29 [sztvhu] Simplify 2013-10-15 01:33:20 +02:00
Philipp Hagemeister
3ddf1a6d01 Merge remote-tracking branch 'Elbandi/master' 2013-10-15 01:26:34 +02:00
Philipp Hagemeister
850555c484 Merge remote-tracking branch 'origin/master' 2013-10-15 01:25:47 +02:00
Philipp Hagemeister
9ed3bdc64d [tudou] Add support for youku links (Closes #1571) 2013-10-15 01:20:04 +02:00
Jaime Marquínez Ferrándiz
c45aa56080 [gamespot] Fix video extraction (fixes #1587) 2013-10-14 16:46:07 +02:00
Philipp Hagemeister
7394b8db3b Merge remote-tracking branch 'origin/master' 2013-10-14 16:07:53 +02:00
Andras Elso
f9b3d7af47 Add an extractor for Szombathelyi TV 2013-10-14 13:07:47 +02:00
Filippo Valsorda
ea62a2da46 add VideoPremium.tv RTMP support 2013-10-14 01:32:47 -04:00
Filippo Valsorda
7468b6b71d Merge pull request #1569 from Jaiz909/1321-download-annotations
Added downloading annotations download support - closes #1321
2013-10-13 22:03:22 -07:00
Jai Grimshaw
1fb07d10a3 [youtube] Adds #1312 Download annotations
Adds #1321 Download annotations from youtube
Annotations are downloaded and written to a .annotations.xml file using the https://www.youtube.com/annotations_invideo?features=1&legacy=1&video_id=$VIDEOID API.
Added unit test for annotations.
2013-10-14 16:22:27 +11:00
Philipp Hagemeister
9378ae6e1d [youku] Allow shortcut youku:ID and make non-matching groups non-matching (#1571) 2013-10-13 15:55:05 +02:00
Philipp Hagemeister
06723d47c4 Merge remote-tracking branch 'jaimeMF/opus-fix' 2013-10-13 15:26:10 +02:00
Jaime Marquínez Ferrándiz
69a0c470b5 [arte] Add an extractor for future.arte.tv (closes #1593) 2013-10-13 14:21:13 +02:00
Jaime Marquínez Ferrándiz
c40f5cf45c [arte] add an extractor for creative.arte.tv (#1593)
The +7 videos now use an independent extractor that is also used for the creative videos
2013-10-13 13:54:31 +02:00
Jaime Marquínez Ferrándiz
4b7b839f24 Add an extractor for rottentomatoes.com and improve InternetVideoArchiveIE to get the best quality 2013-10-12 22:22:31 +02:00
Jaime Marquínez Ferrándiz
3d60d33773 Add an extractor for videodetective.com (closes #262)
It uses the internetvideoarchive.com platform.
2013-10-12 21:36:17 +02:00
Jaime Marquínez Ferrándiz
d7e66d39a0 Add an extractor for internetvideoarchive.com videos
It's used by videodetective.com
2013-10-12 21:34:04 +02:00
Filippo Valsorda
d3f46b9aa5 Add support for single-test tox runs
Use a sintax like
    tox test.test_download:TestDownload.test_NowVideo
to run the specific test on all the tox environments (Python versions)
2013-10-12 13:17:11 -04:00
Filippo Valsorda
f5e54a1fda add support for NowVideo.ch 2013-10-12 13:11:03 -04:00
Jaime Marquínez Ferrándiz
4eb7f1d12e FFmpegPostProcessor: print the command line used if the --verbose option is given 2013-10-12 13:49:27 +02:00
Jaime Marquínez Ferrándiz
0f6d12e43c Don't set the '-aq' option with the opus format (fixes #1263) 2013-10-12 13:30:30 +02:00
Jaime Marquínez Ferrándiz
b4cdc245cf Merge pull request #1590 from joeyadams/master
Fix Brightcove detection when another Flash object is on the page
2013-10-12 02:09:39 -07:00
Joey Adams
3283533149 Fix Brightcove detection when another Flash object is on the page
The regex used non-greedy match, but alas it failed on input like this:

    <object class="...> ... class="BrightcoveExperience"

It captured two objects and the intervening HTML.  This commit fixes this by
not allowing a ">" to appear before BrightcoveExperience.

Video in question: http://www.harpercollinschildrens.com/feature/petethecat/
2013-10-11 21:52:33 -04:00
Jaime Marquínez Ferrándiz
8032e31f2d Merge pull request #1558 from rzhxeo/cinemassacre
Add support for http://cinemassacre.com
2013-10-11 20:38:26 +02:00
Jaime Marquínez Ferrándiz
d2f9cdb205 Merge branch 'cinemassacre' of github.com:rzhxeo/youtube-dl into rzhxeo-cinemassacre 2013-10-11 19:53:27 +02:00
Jaime Marquínez Ferrándiz
3823342d9d [arte] Prepare for generic format support (#980) 2013-10-11 16:33:31 +02:00
Jaime Marquínez Ferrándiz
91dbaef406 [nhl] Add an extractor for videocenter's categories (#1586)
It downloads the last 12 videos.
2013-10-11 14:33:26 +02:00
Jaime Marquínez Ferrándiz
9026dd3858 Make sure it only runs rtmpdump one time in test mode and return True if the download can be resumed 2013-10-11 12:42:15 +02:00
Jaime Marquínez Ferrándiz
81d7f1928c Merge pull request #1565 from rzhxeo/rtmpdump_test
Only download 1 sec. with rtmpdump in test mode
2013-10-11 12:40:18 +02:00
Jaime Marquínez Ferrándiz
bc4f29170f Add a PostProcessor for adding metadata to the file (closes #1570)
It currently sets the title, the date and the author values.
2013-10-11 11:19:09 +02:00
Jaime Marquínez Ferrándiz
cb354c8f62 [yahoo] Download the info from another page
The 'meta' field is not always in the video webpage
2013-10-10 21:01:45 +02:00
Jaime Marquínez Ferrándiz
1cbb27b151 [gamespot] Mark as broken (#1587) 2013-10-10 19:55:52 +02:00
Jaime Marquínez Ferrándiz
0ab4ff6378 [mtv] Strip the description
There were some tabs and newlines added around the string.
2013-10-10 19:53:44 +02:00
Jaime Marquínez Ferrándiz
63da13e829 Add an extractor for faz.net (closes #1582) 2013-10-10 19:37:17 +02:00
Jaime Marquínez Ferrándiz
4193a453c2 Don't add extractors with IE_DESC set to False to the page of supported sites. 2013-10-10 16:18:02 +02:00
Jaime Marquínez Ferrándiz
2e1fa03bf5 Add an extractor for video.nhl.com (closes #1586) 2013-10-10 16:16:49 +02:00
Philipp Hagemeister
8f1ae18a18 release 2013.10.09 2013-10-09 23:50:47 +02:00
Philipp Hagemeister
57da92b7df [youtube] Do not recognize attribution link as user (Fixes #1573) 2013-10-09 23:50:38 +02:00
Jaime Marquínez Ferrándiz
df4f632dbc Merge pull request #1584 from wingsuit/master
Tiny tpo
2013-10-09 07:44:06 -07:00
Jaime Marquínez Ferrándiz
a34c2faae4 [youtube] set the 'name' parameter in the subtitles url (fixes #1577) 2013-10-09 16:41:36 +02:00
Tom
1d368c7589 Tiny tpo 2013-10-09 21:56:09 +08:00
Jaime Marquínez Ferrándiz
88bd97e34c [vevo] Some improvements (fixes #1580)
Extract the info from http://videoplayer.vevo.com/VideoService/AuthenticateVideo?isrc={id}
Some videos don't have an smil manifest, extract the video urls directly from the json and use the last version of the video.
Extract all the available formats and set the 'formats' field of the result
2013-10-08 21:25:38 +02:00
Jaime Marquínez Ferrándiz
2ae3edb1cf Fix the printing of the proxy map in debug mode
The proxies have to be extracted from the opener.handlers
2013-10-07 21:10:31 +02:00
Philipp Hagemeister
b2ad967e45 Simplify test setup 2013-10-07 19:06:36 +02:00
Philipp Hagemeister
a27b9e8bd5 Move opener setup into a separate helper function 2013-10-07 19:01:47 +02:00
Philipp Hagemeister
4481a754e4 release 2013.10.07 2013-10-07 14:34:19 +02:00
Philipp Hagemeister
faa6ef6bc8 [jeuxvideo] Improve code quality (fixes #1567) 2013-10-07 14:33:23 +02:00
Philipp Hagemeister
15870e90b0 Restore warning when user forgets to quote URL (#1396) 2013-10-07 12:21:24 +02:00
rzhxeo
8e4f824365 Remove test parameter from _download_with_rtmpdump 2013-10-06 22:04:32 +02:00
Jaime Marquínez Ferrándiz
387ae5f30b [vimeo] Recognize urls ending in a slash (fixes #1242) 2013-10-06 21:56:23 +02:00
rzhxeo
ad7a071ab6 Only download 1 sec. with rtmpdump in test mode 2013-10-06 20:55:24 +02:00
Philipp Hagemeister
1310bf2474 [redtube] add age_limit 2013-10-06 16:39:35 +02:00
Philipp Hagemeister
b24f347190 Merge branch 'download-archive'
Conflicts:
	youtube_dl/YoutubeDL.py
	youtube_dl/__init__.py
2013-10-06 16:30:26 +02:00
Philipp Hagemeister
ee6c9f95e1 Remove superfluous parenthesis 2013-10-06 16:28:36 +02:00
Philipp Hagemeister
2a69c6b879 Merge branch 'age_limit' 2013-10-06 16:23:18 +02:00
Philipp Hagemeister
cfadd183c4 Call extracted property age_limit everywhere 2013-10-06 16:23:06 +02:00
Philipp Hagemeister
e484c81f0c [generic] Clarify error messages 2013-10-06 16:03:18 +02:00
Philipp Hagemeister
7e5e8306fd release 2013.10.06 2013-10-06 07:13:14 +02:00
Philipp Hagemeister
41e8bca4d0 [viddler] Add basic support (Fixes #1520) 2013-10-06 07:12:47 +02:00
Philipp Hagemeister
8dbe9899a9 Allow users to specify an age limit (fixes #1545)
With these changes, users can now restrict what videos are downloaded by the intented audience, by specifying their age with --age-limit YEARS .
Add rudimentary support in youtube, pornotube, and youporn.
2013-10-06 06:08:56 +02:00
Philipp Hagemeister
f4aac741d5 Move try_rm to test helpers 2013-10-06 05:47:17 +02:00
Philipp Hagemeister
c1c9a79c49 Add basic --download-archive option
Often, users want to be able to download only videos they haven't seen before, despite the video files having been deleted or moved in the mean time.
When --download-archive FILE is given, the extractor and ID of every download is recorded in the specified file. If it is already present, the video in question is skipped.
2013-10-06 04:27:10 +02:00
Philipp Hagemeister
226113c880 Merge remote-tracking branch 'origin/tox' 2013-10-05 22:47:44 +02:00
Filippo Valsorda
8932a66e49 [fixup] remove unnecessary commented function 2013-10-05 16:38:37 -04:00
Filippo Valsorda
79cfb46d42 add tox configuration file for easy testing 2013-10-05 16:08:48 -04:00
Filippo Valsorda
00fcc17aee add capability to suppress expected warnings in tests 2013-10-05 15:55:58 -04:00
Philipp Hagemeister
e94b783c74 [googleplus] Fix upload_date detection 2013-10-05 16:38:33 +02:00
Philipp Hagemeister
97dae9ae07 [bliptv] Make sure video ID is a string 2013-10-05 16:12:29 +02:00
rzhxeo
ca215e0a4f [CinemassacreIE] Use MD5 to check in TEST description 2013-10-05 13:42:17 +02:00
rzhxeo
91a26ca559 [CinemassacreIE] Remove docstring from class 2013-10-05 13:40:05 +02:00
rzhxeo
1ece880d7c [CinemassacreIE] Add support for other embed methods 2013-10-05 13:36:13 +02:00
rzhxeo
400afddaf4 Add CinemassacreIE 2013-10-05 09:37:11 +02:00
Jaime Marquínez Ferrándiz
c3fef636b5 [dailymotion] Fix playlist extraction
The html code has changed, make the video ids extraction more solid.
2013-10-04 14:07:29 +02:00
Philipp Hagemeister
46e28a84ca [brightcove] Fix up some broken HTML (#1553) 2013-10-04 11:53:49 +02:00
Philipp Hagemeister
17ad2b3fb1 [yahoo] Switch ext of test 2013-10-04 11:44:56 +02:00
Philipp Hagemeister
5e2a60db4a [yahoo] Fix test title 2013-10-04 11:44:02 +02:00
Philipp Hagemeister
cd214418f6 [redtube] pep8 2013-10-04 11:41:57 +02:00
Philipp Hagemeister
ba2d9f213e [jeuxvideo] fix video file md5sum 2013-10-04 11:38:56 +02:00
Philipp Hagemeister
7f8ae73a5d Include length in player cache ID
Some videos use the same player with IDs of multiple lengths.
See https://travis-ci.org/rg3/youtube-dl/jobs/12126506#L319 for an example.
2013-10-04 11:36:06 +02:00
Philipp Hagemeister
466880f531 [yahoo] Do not try to run rtmpdump on travis 2013-10-04 11:34:12 +02:00
Philipp Hagemeister
9f1f6d2437 [rtlnow] Skip test on travis 2013-10-04 11:33:14 +02:00
Philipp Hagemeister
9e0f897f6b [francetv] Use common format for ID of generation-quoi subextractor 2013-10-04 11:30:47 +02:00
Philipp Hagemeister
c0f6aa876f Merge remote-tracking branch 'origin/master' 2013-10-04 11:14:20 +02:00
Philipp Hagemeister
d93bdee9a6 [comedycentral] Prepare for generic video extraction (#980) 2013-10-04 11:14:10 +02:00
Philipp Hagemeister
f13d09332d [mtv] Prepare for #980 2013-10-04 11:10:04 +02:00
Philipp Hagemeister
2f5865cc6d Clarify that url and ext are optional when formats is given (#980) 2013-10-04 11:09:43 +02:00
Philipp Hagemeister
deefc05b88 Document formats (for #980) 2013-10-04 10:40:42 +02:00
Philipp Hagemeister
0d8cb1cc14 [ted] Prepare #980 merge 2013-10-04 10:32:34 +02:00
Jaime Marquínez Ferrándiz
a90b9fd209 Merge pull request #1551 from rzhxeo/flickr
[FlickrIE] Fix HTTPS url
2013-10-03 23:14:12 -07:00
rzhxeo
829493439a [FlickrIE] Fix HTTPS url 2013-10-04 07:47:40 +02:00
Pierre Rudloff
73b4fafd82 Use self._download_webpage everywhere 2013-10-04 01:12:42 +02:00
Pierre Rudloff
b039775057 Unused variable 2013-10-04 01:07:24 +02:00
Pierre Rudloff
5c1d63b737 Changes suggested by @phihag 2013-10-04 01:04:38 +02:00
Philipp Hagemeister
3cd022f6e6 Merge remote-tracking branch 'rzhxeo/rtl_ntv' 2013-10-04 00:59:11 +02:00
Philipp Hagemeister
abefd1f7c4 Merge remote-tracking branch 'rzhxeo/rtl_upload_date' 2013-10-04 00:58:35 +02:00
Philipp Hagemeister
c21315f273 [youtube] new static 82 signature 2013-10-04 00:43:01 +02:00
Philipp Hagemeister
9ab1018b1a release 2013.10.04 2013-10-04 00:38:19 +02:00
Philipp Hagemeister
da0a5d2d6e [france2] Add support for URLs without video IDs (Fixes #1547) 2013-10-04 00:34:36 +02:00
Jaime Marquínez Ferrándiz
ee6adb166c [ign] Support more urls and detect multiple videos in articles (fixes #1543) 2013-10-02 20:59:34 +02:00
Philipp Hagemeister
be8fe32c92 Fix help of --cachedir 2013-10-02 14:37:19 +02:00
Philipp Hagemeister
c38b1e776d [youtube] Simplify cache_dir code (#1529) 2013-10-02 08:41:14 +02:00
Philipp Hagemeister
4f8bf17f23 Merge remote-tracking branch 'holomorph/master' 2013-10-02 08:23:53 +02:00
Philipp Hagemeister
ca40186c75 [youtube] Fix static 82 signature (Closes #1539) 2013-10-02 08:20:00 +02:00
Philipp Hagemeister
a8c6b24155 [youtube] Support videos without a title (Fixes #1391, Closes #1542) 2013-10-02 07:25:35 +02:00
Filippo Valsorda
bd8e5c7ca2 Merge pull request #1531 from rg3/no-playlist
[youtube] implement --no-playlist to only download current video
2013-10-01 10:08:20 -07:00
Filippo Valsorda
7c61bd36bb [youtube] correct --no-playlist for python3 2013-10-01 11:58:13 -04:00
Jaime Marquínez Ferrándiz
c54283824c [dailymotion] Detect vevo videos (fixes #1532)
All videos from the Vevo user, just embed videos from vevo.com
2013-10-01 15:05:41 +02:00
Philipp Hagemeister
52f15da2ca release 2013.10.01.1 2013-10-01 14:44:26 +02:00
Philipp Hagemeister
44d466559e Properly handle stream meap not being present 2013-10-01 14:44:09 +02:00
Philipp Hagemeister
05751eb047 release 2013.10.01 2013-10-01 11:43:54 +02:00
Philipp Hagemeister
f10503db67 Handle videos without url_encoded_fmt_stream_map (Fixes #1535) 2013-10-01 11:39:11 +02:00
rzhxeo
adfeafe9e1 [RTLnowIE] Allow video description without upload date
Some videos (feature films) have no upload date.
2013-10-01 07:22:49 +02:00
rzhxeo
4c62a16f4f [RTLnowIE] Add support for http://n-tvnow.de 2013-10-01 06:55:30 +02:00
rzhxeo
c0de39e6d4 Merge pull request #2 from rg3/master
Update
2013-09-30 21:39:58 -07:00
Mark Oteiza
fa55675593 Support XDG base directory specification 2013-09-30 18:22:38 -04:00
Filippo Valsorda
d4d9920a26 add test for --no-playlist 2013-09-30 18:01:17 -04:00
Filippo Valsorda
47192f92d8 implement --no-playlist to only download current video - closes #755 2013-09-30 16:26:25 -04:00
Jaime Marquínez Ferrándiz
722076a123 [rtlnow] Replace one of the tests
The video is no longer available.
2013-09-29 23:07:26 +02:00
Jaime Marquínez Ferrándiz
bb4aa62cf7 [appletrailers] The request for the settings must have the trailer name in lower case (fixes #1329) 2013-09-29 20:59:19 +02:00
Jaime Marquínez Ferrándiz
843530568f [appletrailers] Rework extraction (fixes #1387)
The exraction was broken:
* The includes page contains img elements that need to be fixed.
* Use the 'itunes.inc' page, it contains a json dictionary for each trailer with information.
* Get the formats from 'includes/settings{trailer_name}.json'
* Use urljoin to allow urls with a fragment identifier to work

Removed the thumbnail urls from the tests, they are different now.
2013-09-29 20:49:58 +02:00
Philipp Hagemeister
138a5454b5 release 2013.09.29 2013-09-29 14:38:37 +02:00
Philipp Hagemeister
d279037036 [update] Prevent cmd window popup on Windows (Fixes #1478) 2013-09-29 14:37:06 +02:00
Philipp Hagemeister
46353f6783 [update] Look for .exe extension on Windows (Fixes #745) 2013-09-29 14:37:00 +02:00
Jaime Marquínez Ferrándiz
70922df8b5 [dailymotion] Disable the family filter in the playlists (fixes #1524) 2013-09-29 12:44:02 +02:00
Jaime Marquínez Ferrándiz
9c15e9de84 [yahoo] Fix video extraction (fixes #1521)
There's no need to use two different methods.
Now we can also download videos over http if possible.
Also run the test for rtmp videos, but skip the download.
2013-09-28 21:19:52 +02:00
Philipp Hagemeister
123c10608d Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-28 15:43:38 +02:00
Philipp Hagemeister
0b7c2485b6 [zdf] Add support for hash URLs and simplify (#1518) 2013-09-28 15:43:34 +02:00
Jaime Marquínez Ferrándiz
9abb32045a [youtube] Add hlsvp to the error message if it can't be found and remove the live stream test
It's no longer available, other olympics streams have the same problem.
2013-09-27 15:06:27 +02:00
Jaime Marquínez Ferrándiz
f490e77e77 [youtube] Set the thumbnail to None if it can't be extracted 2013-09-27 14:22:36 +02:00
Jaime Marquínez Ferrándiz
2dc592991a [youtube] update description of test 2013-09-27 14:20:52 +02:00
Jaime Marquínez Ferrándiz
0a60edcfa9 Don't fail if the video thumbnail couldn't be downloaded (fixes #1516)
Just report a warning
2013-09-27 14:19:19 +02:00
Philipp Hagemeister
c53f9d30c8 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-27 13:09:58 +02:00
Philipp Hagemeister
509f398292 Remove youtube_genalgo (#1515)
With the automatic signature extraction, this script has become superfluous now
2013-09-27 13:09:24 +02:00
Jaime Marquínez Ferrándiz
74bab3f0a4 Don't embed subtitles if the list is empty or the field is not set (fixes #1510) 2013-09-27 08:08:43 +02:00
Philipp Hagemeister
8574862991 Merge remote-tracking branch 'rzhxeo/RTL_T' 2013-09-27 06:25:04 +02:00
Philipp Hagemeister
2de957c7e1 Merge remote-tracking branch 'rzhxeo/RTL' 2013-09-27 06:23:10 +02:00
Philipp Hagemeister
920de7a27d [youtube] Fix 83 signature (Closes #1511) 2013-09-27 06:15:21 +02:00
rzhxeo
63efc427cd [RTLnowIE] Clean video title
The title of some videos has the following format:
Series - Episode | Series online schauen bei ... NOW
2013-09-27 06:00:37 +02:00
rzhxeo
ce65fb6c76 [RTLnowIE] Add support for http://rtlnitronow.de 2013-09-27 05:50:16 +02:00
Jaime Marquínez Ferrándiz
4de1994b6e [brightcove] Use direct url for the tests
The test_all_urls.py test failed because BrightcoveIE doesn't match them.
2013-09-26 18:59:56 +02:00
Jaime Marquínez Ferrándiz
592882aa9f [brightcove] Support videos that only provide flv versions (fixes #1504)
Moved the test from generic.py to brightcove.py
2013-09-26 13:54:31 +02:00
Philipp Hagemeister
b98d6a1e19 release 2013.09.24.2 2013-09-24 21:55:34 +02:00
Philipp Hagemeister
29c7a63df8 Remove debugging code 2013-09-24 21:55:25 +02:00
Philipp Hagemeister
8b25323ae2 release 2013.09.24.1 2013-09-24 21:40:47 +02:00
Philipp Hagemeister
f426de8460 Merge remote-tracking branch 'origin/master' 2013-09-24 21:40:30 +02:00
Philipp Hagemeister
695dc094ab Merge branch 'automatic-signatures' 2013-09-24 21:40:08 +02:00
Jaime Marquínez Ferrándiz
e80d861064 Revert "[southparkstudios] Fix mgid extraction"
This reverts commit 0fd49457f5.

It seems that the redesign was temporary.
2013-09-24 21:39:38 +02:00
Philipp Hagemeister
2cdeb20135 release 2013.09.24 2013-09-24 21:28:06 +02:00
Philipp Hagemeister
7f74773254 Add option --no-cache-dir 2013-09-24 21:26:10 +02:00
Philipp Hagemeister
f2c327fd39 Fix 86 signature (#1494) 2013-09-24 21:20:42 +02:00
Philipp Hagemeister
e35e4ddc9a Fix output of --youtube-print-sig-code when counting down to 0 2013-09-24 21:18:03 +02:00
Philipp Hagemeister
c3c88a2664 Allow opts.cachedir == None to disable cache 2013-09-24 21:04:43 +02:00
Jaime Marquínez Ferrándiz
bb0eee71e7 [youtube] Update one of the test's description 2013-09-24 21:04:13 +02:00
Jaime Marquínez Ferrándiz
6f56389b88 [youtube] update algos for length 86 and 84 (fixes #1494) 2013-09-24 21:02:00 +02:00
Jaime Marquínez Ferrándiz
5b333c1ce6 [francetv] Add an extractor for Generation Quoi (closes #1475) 2013-09-23 21:41:54 +02:00
Jaime Marquínez Ferrándiz
a825f33030 [francetv] Add an extractor for France2 2013-09-23 21:28:33 +02:00
Philipp Hagemeister
92f618f2e2 Merge remote-tracking branch 'origin/master' 2013-09-23 11:24:49 +02:00
Philipp Hagemeister
81ec7c7901 [facebook] Allow untitled videos (Fixes #1484) 2013-09-23 11:24:33 +02:00
Jaime Marquínez Ferrándiz
dd5d2eb03c If the file is already downloaded include the size in the progress hook 2013-09-22 23:39:30 +02:00
Jaime Marquínez Ferrándiz
4ae720042c Include the eta and the speed in the progress hooks
Useful when listening to the progress hook, for example in a GUI.
2013-09-22 23:31:39 +02:00
Philipp Hagemeister
c705320f48 Correct test strings 2013-09-22 12:18:16 +02:00
Philipp Hagemeister
d2d8f89531 Do not warn if fallback is without alternatives (because we did not get the flash player URL) 2013-09-22 12:18:10 +02:00
Philipp Hagemeister
bdde940e90 [youtube] Improve flash player URL handling 2013-09-22 12:17:42 +02:00
Philipp Hagemeister
45f4a76dbc Work around nosetests nosiness 2013-09-22 11:45:29 +02:00
Philipp Hagemeister
13dc64ce74 [youtube] Remove _decrypt_signature_age_gate 2013-09-22 11:17:21 +02:00
Philipp Hagemeister
c35f9e72ce Move cachedir doc 2013-09-22 11:09:25 +02:00
Philipp Hagemeister
f8061589e6 [youtube] Actually pass in cachedir option 2013-09-22 10:51:33 +02:00
Philipp Hagemeister
0ca96d48c7 [youtube] Improve source code quality 2013-09-22 10:37:23 +02:00
Philipp Hagemeister
4ba146f35d Update static signatures 2013-09-22 10:31:25 +02:00
Philipp Hagemeister
edf3e38ebd [youtube] Improve cache and add an option to print the extracted signatures 2013-09-22 10:30:02 +02:00
Philipp Hagemeister
c4417ddb61 [youtube] Add filesystem signature cache 2013-09-22 00:35:03 +02:00
tewe
4a2080e407 [youku] better error handling
blocked videos used to cause death by TypeError, now we report what the
server says
2013-09-21 20:50:31 +02:00
Philipp Hagemeister
2f2ffea9ca Clarify a couple of calls 2013-09-21 15:34:29 +02:00
Philipp Hagemeister
ba552f542f Use reader instead of indexing 2013-09-21 15:32:37 +02:00
Philipp Hagemeister
8379969834 Prepare signature function caching 2013-09-21 15:19:48 +02:00
Philipp Hagemeister
95dbd2f990 Change test target (Verified with node.js) 2013-09-21 15:10:38 +02:00
Philipp Hagemeister
a7177865b1 Implement more opcodes 2013-09-21 14:48:12 +02:00
Philipp Hagemeister
e0df6211cc Restore accidentally deleted commits
That's what happens if you let Windows machines write :(
2013-09-21 14:40:35 +02:00
Jaime Marquínez Ferrándiz
b00ca882a4 [livestream] Fix events extraction (fixes #1467) 2013-09-21 13:50:52 +02:00
Jaime Marquínez Ferrándiz
39baacc49f [dailymotion] Add an extractor for users (closes #1476) 2013-09-21 12:45:53 +02:00
Jaime Marquínez Ferrándiz
3a1d48d6de [dailymotion] Raise ExtractorError if the dailymotion response reports an error 2013-09-21 12:15:54 +02:00
Philipp Hagemeister
34308b30d6 Warn if no locale is set (#1474) 2013-09-21 11:48:07 +02:00
Philipp Hagemeister
bc1506f8c0 Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-21 11:10:30 +02:00
Philipp Hagemeister
b61067fa4f Abort if extractaudio is given without a variable extension (#1470) 2013-09-21 11:10:22 +02:00
Jaime Marquínez Ferrándiz
69b227a9bc [southparkstudios] add support for http://www.southparkstudios.com/full-episodes/* urls (closes #1469) 2013-09-21 10:58:43 +02:00
Jaime Marquínez Ferrándiz
0fd49457f5 [southparkstudios] Fix mgid extraction 2013-09-21 10:51:25 +02:00
Philipp Hagemeister
58f289d013 release 2013.09.20.1 2013-09-20 22:59:14 +02:00
Jaime Marquínez Ferrándiz
3d60bb96e1 Add an extractor for ebaumsworld.com (closes #1462) 2013-09-20 16:55:50 +02:00
Jaime Marquínez Ferrándiz
38d025b3f0 [youtube] add algo for length 91 2013-09-20 14:43:16 +02:00
Jaime Marquínez Ferrándiz
c40c6aaaaa Catch socket.error before IOError
Since python 2.6 it's a child class.
2013-09-20 13:26:03 +02:00
Jaime Marquínez Ferrándiz
1a810f0d4e [funnyordie] Fix video url extraction 2013-09-20 13:05:34 +02:00
Philipp Hagemeister
63037593c0 release 2013.09.20 2013-09-20 10:24:48 +02:00
Jaime Marquínez Ferrándiz
7a878d47fa Merge pull request #1464 from patrickslin/patch-7
Unable to decrypt signature length 93 (fixes #1461)
2013-09-20 08:25:10 +02:00
patrickslin
bc4b900898 Unable to decrypt signature length 93 (fixes #1461) 2013-09-19 21:49:06 -07:00
Jaime Marquínez Ferrándiz
c5e743f66f [fktv] support videos splitted in any number of parts and some style changes 2013-09-18 23:32:37 +02:00
Jaime Marquínez Ferrándiz
6c36d8d6fb Merge pull request #1438 from rzhxeo/fktv
Add support for http://fernsehkritik.tv
2013-09-18 23:05:56 +02:00
Jaime Marquínez Ferrándiz
71c82637e7 [youtube] apply the fix for lists with number of videos multiple of _MAX_RESULTS to user extraction
Copied from the playlist extractor.
2013-09-18 23:00:32 +02:00
Philipp Hagemeister
2dad310e2c Credit @Ruirize for newgrounds 2013-09-18 22:30:22 +02:00
Philipp Hagemeister
d0ae9e3a8d [newgrounds] simplify 2013-09-18 22:14:43 +02:00
Ruirize
a19413c311 Changed file hash. 2013-09-18 17:17:12 +01:00
Ruirize
1ef80b55dd Fixes test fail
Was unaware of --id being passed to test.
2013-09-18 16:23:38 +01:00
Ruirize
eb03f4dad3 Added Newgrounds support 2013-09-18 15:54:45 +01:00
Philipp Hagemeister
830dd1944a Clarify -i help (#1453) 2013-09-18 13:23:04 +02:00
Pierre Rudloff
cc6943e86a Improvements 2013-09-18 00:07:04 +02:00
rzhxeo
1237c9a3a5 XHamsterIE: Fix support for new HD video url format and add test (closes PR #1443) 2013-09-17 23:08:01 +02:00
Pierre Rudloff
8f77093262 Merge remote-tracking branch 'upstream/master' into websurg 2013-09-17 23:07:44 +02:00
Jaime Marquínez Ferrándiz
5d13df79a5 [francetv] Remove Pluzz test
Videos expire in 7 days
2013-09-17 22:49:43 +02:00
Pierre Rudloff
d79a0e233a Extractor for websurg.com 2013-09-17 22:13:40 +02:00
Jaime Marquínez Ferrándiz
6523223a4c [hotnewhiphop] Fix test case title 2013-09-17 21:10:57 +02:00
Jaime Marquínez Ferrándiz
4a67aafb7e [youtube] Don't search the flash player version for videos with age gate activated 2013-09-17 20:59:55 +02:00
Philipp Hagemeister
f3f34c5b0f release 2013.09.17 2013-09-17 17:00:20 +02:00
Philipp Hagemeister
6ae8ee3f54 Update 85 signature (Fixes #1449)
This is the first signature algorithm to have been parsed automatically, although that only works for HTML5 players for now, and is not yet integrated into master.
2013-09-17 16:59:13 +02:00
Jaime Marquínez Ferrándiz
e8f8e80097 Add an extractor for vice.com (closes #1051) 2013-09-16 20:58:36 +02:00
Jaime Marquínez Ferrándiz
4dc0ff3ecf [ooyala] prefer ipad url
It has better quality with m3u8 manifests
2013-09-16 20:38:54 +02:00
Jaime Marquínez Ferrándiz
4b6462fc1e Add an extractor for Bloomberg (closes #1436) 2013-09-16 20:38:48 +02:00
Jaime Marquínez Ferrándiz
c4ece78564 [ooyala] add support for more type of video urls, like m3u8 manifests. 2013-09-16 19:34:10 +02:00
rzhxeo
0761d02b0b Add FKTV extractor 2013-09-16 14:46:19 +02:00
rzhxeo
71c107fc57 Add FKTV extractor
Support for Fernsehkritik-TV (incl. Postecke)
2013-09-16 14:45:14 +02:00
Philipp Hagemeister
7459e3a290 Always correct encoding when writing to sys.stderr (Fixes #1435) 2013-09-16 06:55:41 +02:00
Philipp Hagemeister
f9e66fb993 release 2013.09.16 2013-09-16 04:12:57 +02:00
Philipp Hagemeister
6c603ccce3 [devscripts/release] temporary workarounds 2013-09-16 04:12:43 +02:00
Philipp Hagemeister
ef66b0c6ef Merge remote-tracking branch 'origin/master' 2013-09-16 03:32:53 +02:00
Philipp Hagemeister
22b50ecb2f Starts of a Windows service 2013-09-16 03:32:45 +02:00
Jaime Marquínez Ferrándiz
5a6fecc3de Add an extractor for southparkstudios.com (closes #1434)
It uses the MTV system
2013-09-15 23:30:58 +02:00
Jaime Marquínez Ferrándiz
cdbccafed9 Merge pull request #1422 from rzhxeo/xhamster
XHamsterIE: Add support for new URL format (download in hd by default)
2013-09-15 12:18:39 +02:00
Jaime Marquínez Ferrándiz
e69ae5b9e7 [youtube] support youtube.googleapis.com/v/* urls (fixes #1425) 2013-09-15 12:14:59 +02:00
Jaime Marquínez Ferrándiz
92790f4e54 [soundcloud] Add an extractor for users (closes #1426) 2013-09-14 21:41:49 +02:00
Jaime Marquínez Ferrándiz
471a5ee908 Set the ext field for each format 2013-09-14 14:45:04 +02:00
Jaime Marquínez Ferrándiz
19e1d35989 [mixcloud] Rewrite extractor (fixes #278) 2013-09-14 14:26:42 +02:00
Jaime Marquínez Ferrándiz
0b7f31184d Now --all-sub is a modifier to --write-sub and --write-auto-sub (closes #1412)
For keeping backwards compatibility --all-sub sets --write-sub if --write-auto-sub is not given
2013-09-14 11:14:40 +02:00
Jaime Marquínez Ferrándiz
fad84d50fe [googleplus] Fix upload date extraction 2013-09-14 11:10:01 +02:00
rzhxeo
9a1c32dc54 XHamsterIE: Add support for new URL format 2013-09-14 05:42:00 +02:00
Jaime Marquínez Ferrándiz
a921f40799 [ustream] Simplify channel extraction
the ChannelParser has been moved to a new function in utils get_meta_content
Instead of the SocialStreamParser now it uses a regex
2013-09-13 22:05:29 +02:00
Jaime Marquínez Ferrándiz
74ac9bdd82 Merge pull request #1413 from tewe/master
Add Ustream channel support
2013-09-13 21:34:31 +02:00
Jaime Marquínez Ferrándiz
94518f2087 Merge pull request #1409 from JohnyMoSwag/master (closes #1404)
added kickstarter IE
2013-09-13 19:52:56 +02:00
Jaime Marquínez Ferrándiz
535f59bbcf Merge pull request #1350 from Jaiz909/description-keyerror-fix
Fixed issue #1277 KeyError when no description.
2013-09-13 18:20:42 +02:00
Philipp Hagemeister
71cedb3c0c [buildserver] Service installation and uninstallation 2013-09-13 02:25:12 +02:00
Jaime Marquínez Ferrándiz
dd01d6558a [gamespot] Update test video title 2013-09-12 22:18:39 +02:00
Jaime Marquínez Ferrándiz
ce85f022d2 [youtube] update algo for length 82 (fixes #1416) 2013-09-12 22:04:09 +02:00
Jaime Marquínez Ferrándiz
ad94a6fe44 [canalplust] accept urls that don't include the video id (fixes #1415), extract more info and update test 2013-09-12 21:56:36 +02:00
Philipp Hagemeister
353ba14060 [buildserver] Rely on repository license 2013-09-12 16:34:24 +02:00
Philipp Hagemeister
83de794223 Add original buildserver from @fraca7 2013-09-12 16:30:43 +02:00
tewe
bfd5c93af9 Add Ustream channel support 2013-09-12 12:30:14 +02:00
Jaime Marquínez Ferrándiz
c247d87ef3 [funnyordie] fix video url extraction 2013-09-12 11:31:27 +02:00
Philipp Hagemeister
07ac9e2cc2 release 2013.09.12 2013-09-12 11:26:44 +02:00
Jaime Marquínez Ferrándiz
6bc520c207 Check for both automatic captions and subtitles with options --write-sub and --write-auto-sub (fixes #1224) 2013-09-12 11:15:25 +02:00
Johny Mo Swag
f1d20fa39f added kickstarter IE 2013-09-11 14:50:38 -07:00
Jaime Marquínez Ferrándiz
e3dc22ca3a [youtube] Fix detection of videos with automatic captions 2013-09-11 19:24:56 +02:00
Jaime Marquínez Ferrándiz
d665f8d3cb [subtitles] Also list the available automatic captions languages with '--list-sub' 2013-09-11 19:17:30 +02:00
Jaime Marquínez Ferrándiz
055e6f3657 [youtube] Support automatic captions with original language different from English (fixes #1225) and download in multiple languages. 2013-09-11 19:08:43 +02:00
Jaime Marquínez Ferrándiz
ac4f319ba1 Credit @iemejia 2013-09-11 17:58:51 +02:00
Jaime Marquínez Ferrándiz
542cca0e8c Merge branch 'subtitles_rework' (closes PR #1326) 2013-09-11 17:41:24 +02:00
Jaime Marquínez Ferrándiz
6a2449df3b [howcast] Do not download from http://www.howcast.com/videos/{video_id}
It takes too much to follow the redirection.
2013-09-11 17:36:23 +02:00
Jaime Marquínez Ferrándiz
7fad1c6328 [subtitles] Use self._download_webpage for extracting the subtitles
It raises ExtractorError for the same exceptions we have to catch.
2013-09-11 16:24:47 +02:00
Jaime Marquínez Ferrándiz
d82134c339 [subtitles] Simplify the extraction of subtitles in subclasses and remove NoAutoSubtitlesInfoExtractor
Subclasses just need to call the method extract_subtitles, which will call _extract_subtitles and _request_automatic_caption
Now the default implementation of _request_automatic_caption returns {}.
2013-09-11 16:05:49 +02:00
Jaime Marquínez Ferrándiz
54d39d8b2f [subtitles] rename SubitlesIE to SubtitlesInfoExtractor
Otherwise it can be automatically detected as a IE ready for use.
2013-09-11 15:51:04 +02:00
Jaime Marquínez Ferrándiz
de7f3446e0 [youtube] move subtitles methods from the base extractor to YoutubeIE 2013-09-11 15:48:23 +02:00
Ismael Mejia
f8e52269c1 [subtitles] made inheritance hierarchy flat as requested 2013-09-11 15:21:09 +02:00
Ismael Mejia
cf1dd0c59e Merge branch 'master' into subtitles_rework 2013-09-11 14:26:48 +02:00
Jaime Marquínez Ferrándiz
22c8b52545 In the supported sites page, sort the extractors in case insensitive 2013-09-11 12:04:27 +02:00
Philipp Hagemeister
1f7dc42cd0 release 2013.11.09 2013-09-11 11:30:10 +02:00
Jaime Marquínez Ferrándiz
aa8f2641da [youtube] update algo for length 85 (fixes #1408 and fixes #1406) 2013-09-11 11:24:58 +02:00
Jaime Marquínez Ferrándiz
648d25d43d [francetv] Add an extractor for francetvinfo.fr (closes #1317)
It uses the same system as Pluzz, create a base class for both extractors.
2013-09-10 15:50:34 +02:00
Philipp Hagemeister
df3e61003a Merge pull request #1402 from Rudloff/canalc2
Wrong property name
2013-09-10 03:19:37 -07:00
Pierre Rudloff
6b361ad5ee Wrong property name 2013-09-10 12:13:22 +02:00
Pierre Rudloff
5d8afe69f7 Add an extractor for pluzz.francetv.fr (closes PR #1399) 2013-09-10 12:00:00 +02:00
Philipp Hagemeister
a1ab553858 release 2013.09.10 2013-09-10 11:25:11 +02:00
Jaime Marquínez Ferrándiz
07463ea162 Add an extractor for Slideshare (closes #1400) 2013-09-10 11:19:58 +02:00
Jaime Marquínez Ferrándiz
6d2d21f713 [sohu] add support for my.tv.sohu.com urls (fixes #1398) 2013-09-09 19:56:16 +02:00
Jaime Marquínez Ferrándiz
061b2889a9 Fix the minutes part in FileDownloader.format_seconds (fixed #1397)
It printed for the minutes the result of (seconds // 60)
2013-09-09 10:38:54 +02:00
Jaime Marquínez Ferrándiz
8963d9c266 [youtube] Modify the regex to match ids of length 11 (fixes #1396)
In urls like http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930 you can't split the query string and ids always have that length.
2013-09-09 10:33:12 +02:00
Jaime Marquínez Ferrándiz
890f62e868 Revert "[youtube] Fix detection of tags from HLS videos."
They have undo the change

This reverts commit 0638ad9999.
2013-09-08 18:50:07 +02:00
Philipp Hagemeister
8f362589a5 release 2013.09.07 2013-09-07 22:29:15 +02:00
Philipp Hagemeister
a27a2470cd Merge branch 'master' of github.com:rg3/youtube-dl 2013-09-07 22:28:54 +02:00
Ismael Mejia
72836fcee4 Merge branch 'master' into subtitles_rework 2013-09-06 23:24:41 +02:00
Jaime Marquínez Ferrándiz
a7130543fa [generic] If the url doesn't specify the protocol, then try to extract prepending 'http://' 2013-09-06 18:39:35 +02:00
Jaime Marquínez Ferrándiz
a490fda746 [daylimotion] accept embed urls (fixes #1386) 2013-09-06 18:36:07 +02:00
Jaime Marquínez Ferrándiz
7e77275293 Add an extractor for Metacritic 2013-09-06 18:08:07 +02:00
Ismael Mejia
d6e203b3dc [subtitles] fixed multiple subtitles language separated by comma after merge
As mentioned in the pull request, I forgot to include this changes.
aa6a10c44a
2013-09-06 16:30:13 +02:00
Jaime Marquínez Ferrándiz
e3ea479087 [youtube] Fix some issues with the detection of playlist/channel urls (reported in #1374)
They were being caught by YoutubeUserIE, now it only extracts a url if the rest of extractors aren't suitable.
Now the url tests check that the urls can only be extracted with an specific extractor.
2013-09-06 16:24:24 +02:00
Jaime Marquínez Ferrándiz
faab1d3836 [youtube] Fix detection of feeds urls (fixes #1294)
Urls like https://www.youtube.com/feed/watch_later were being as users (before the last changes to YoutubeUserIE, as videos)
2013-09-06 14:45:49 +02:00
Philipp Hagemeister
8851a574a3 Fix add-versions 2013-09-06 11:07:34 +02:00
Philipp Hagemeister
59282080c8 release 2013.09.06.1 2013-09-06 10:53:35 +02:00
Philipp Hagemeister
98f3da4040 Merge remote-tracking branch 'origin/master' 2013-09-06 10:53:24 +02:00
Philipp Hagemeister
1d213233cd Do not re-download files for hashsum generation (Fixes #1383) 2013-09-06 10:51:53 +02:00
Jaime Marquínez Ferrándiz
fd9cf73836 [youtube] Users: download from the api in json to simplify extraction (fixes #1358)
There could be duplicate videos or other videos if the description have links.
2013-09-06 10:43:02 +02:00
Jaime Marquínez Ferrándiz
0638ad9999 [youtube] Fix detection of tags from HLS videos. 2013-09-06 10:25:31 +02:00
Philipp Hagemeister
1eb527692a release 2013.09.06 2013-09-06 10:13:33 +02:00
Jaime Marquínez Ferrándiz
09bb17e108 Merge pull request #1378 from patrickslin/patch-6
Vevo sig changed again, please update for us! Thanks very much! (fixes #...
2013-09-06 09:53:23 +02:00
patrickslin
1cf911bc82 Vevo sig changed again, please update for us! Thanks very much! (fixes #1375) 2013-09-05 17:38:03 -07:00
Jaime Marquínez Ferrándiz
f4b052321b [youtube] Urls like youtube.com/NASA are now interpreted as users (fixes #1069)
Video urls like http://youtube.com/BaW_jenozKc are not valid, but http://youtu.be/BaW_jenozKc is correct.
2013-09-05 22:39:15 +02:00
Philipp Hagemeister
a636203ea5 release 2013.09.05 2013-09-05 22:30:50 +02:00
Jaime Marquínez Ferrándiz
c215217e39 [youtube] Playlists: extract the videos id from ['media$group']['yt$videoid'] (fixes #1374)
'media$player' is not defined for private videos.
2013-09-05 21:40:04 +02:00
Jaime Marquínez Ferrándiz
08e291b54d [generic] Recognize html5 video in the format '<video src=".+?"' and only unquote the url when extracting the id (fixes #1372) 2013-09-05 18:02:17 +02:00
Jaime Marquínez Ferrándiz
6b95b065be Add extractor for tvcast.naver.com (closes #1331) 2013-09-05 10:53:40 +02:00
Jaime Marquínez Ferrándiz
9363169b67 [daum] Get the video page from a canonical url to extract the full id (fixes #1373) and extract description. 2013-09-05 10:08:17 +02:00
Jaime Marquínez Ferrándiz
085bea4513 Credit @Huarong for tv.sohu.com 2013-09-04 22:09:22 +02:00
Jaime Marquínez Ferrándiz
150f20828b Add extractor for daum.net (closes #1330) 2013-09-04 22:06:50 +02:00
Philipp Hagemeister
08523ee20a release 2013.09.04 2013-09-04 14:33:32 +02:00
Philipp Hagemeister
5d5171d26a Merge pull request #1341 from xanadu/master
add support for "-f mp4" for YouTube
2013-09-03 18:52:12 -07:00
Allan Zhou
96fb5605b2 AHLS -> Apple HTTP Live Streaming 2013-09-03 18:49:35 -07:00
Philipp Hagemeister
7011de0bc2 Merge pull request #1363 from Rudloff/defense
defense.gouv.fr
2013-09-03 18:23:08 -07:00
Allan Zhou
c3dd69eab4 Merge remote-tracking branch 'upstream/master' 2013-09-03 12:22:29 -07:00
Pierre Rudloff
025171c476 Suggested by @phihag 2013-09-03 12:03:19 +02:00
Jaime Marquínez Ferrándiz
c8dbccde30 [orf] Remove the test video, they seem to expire in one week 2013-09-03 11:51:01 +02:00
Jaime Marquínez Ferrándiz
4ff7a0f1f6 [dailymotion] improve the regex for extracting the video info 2013-09-03 11:33:59 +02:00
Jaime Marquínez Ferrándiz
9c2ade40de [vimeo] Handle Assertions Error when trying to get the description
In some pages the html tags are not closed, python 2.6 cannot handle it.
2013-09-03 11:11:36 +02:00
Jaime Marquínez Ferrándiz
aa32314d09 [vimeo] add support for videos that embed the download url in the player page (fixes #1364) 2013-09-03 10:48:56 +02:00
Pierre Rudloff
52afe99665 Extractor for defense.gouv.fr 2013-09-03 01:51:17 +02:00
Pierre Rudloff
b0446d6a33 Merge remote-tracking branch 'upstream/master' 2013-09-03 01:27:49 +02:00
Jaime Marquínez Ferrándiz
8e4e89f1c2 Add an extractor for VeeHD (closes #1359) 2013-09-02 11:54:09 +02:00
Jaime Marquínez Ferrándiz
6c758d79de [metacafe] Add more cases for detecting the uploader detection (reported in #1343) 2013-08-31 22:35:39 +02:00
Jaime Marquínez Ferrándiz
691008087b Add an automatic page generator for the supported sites (related #156)
They are listed in the "supportedsites.html" page.
2013-08-31 15:18:52 +02:00
Allan Zhou
85f03346eb Merge remote-tracking branch 'upstream/master' 2013-08-30 17:51:59 -07:00
Allan Zhou
bdc6b3fc64 add support for "-f mp4" for YouTube 2013-08-30 17:51:50 -07:00
Pierre Rudloff
847f582290 Merge remote-tracking branch 'upstream/master' 2013-08-31 00:37:29 +02:00
Philipp Hagemeister
10f5c016ec release 2013.08.30 2013-08-30 21:02:07 +02:00
Jaime Marquínez Ferrándiz
2e756879f1 [youtube] update algo for length 86 2013-08-30 20:49:51 +02:00
Jaime Marquínez Ferrándiz
c7a7750d3b [youtube] Fix typo in the _VALID_URL for YoutubeFavouritesIE, it was intended to also match :ytfavourites 2013-08-30 20:13:05 +02:00
Jaime Marquínez Ferrándiz
9193c1eede Add youtube keywords to the bash completion script 2013-08-30 20:11:53 +02:00
Jai Grimshaw
b3f0e53048 Fixed issue #1277 KeyError when no description.
Allows a continue with a warning when an extractor cannot retrieve a description.
2013-08-31 01:53:01 +10:00
Philipp Hagemeister
3243d0f7b6 release 2013.08.29 2013-08-29 23:29:34 +02:00
Jaime Marquínez Ferrándiz
23b00bc0e4 [youtube] update algo for length 84
Only appears sometimes, nearly identical to length 86.
2013-08-29 22:44:29 +02:00
Jaime Marquínez Ferrándiz
52e1eea18b [youtube] update algo for length 86 (fixes #1349) 2013-08-29 22:33:58 +02:00
Jaime Marquínez Ferrándiz
ee80d66727 [ign] update 1up extractor to work with the updated IGNIE 2013-08-29 21:51:09 +02:00
Jaime Marquínez Ferrándiz
f1fb2d12b3 [ign] extract videos from articles pages 2013-08-29 21:39:36 +02:00
Jaime Marquínez Ferrándiz
deb2c73212 Merge pull request #1347 from whydoubt/fix_orf_at
Fix orf.at extractor by adding file coding mark
2013-08-29 11:05:38 -07:00
Jeff Smith
8928491074 Fix orf.at extractor by adding file coding mark 2013-08-29 12:51:38 -05:00
Jaime Marquínez Ferrándiz
545434670b Add an extractor for orf.at (closes #1346)
Make find_xpath_attr also accept numbers in the value
2013-08-29 19:16:07 +02:00
Jaime Marquínez Ferrándiz
54fda45bac Merge pull request #1342 from whydoubt/fix_mit_26
Fix MIT extractor for Python 2.6
2013-08-29 13:42:08 +02:00
Jaime Marquínez Ferrándiz
c7bf7366bc Update descriptions checksum for some test for Unistra and Youtube 2013-08-29 13:41:59 +02:00
Jaime Marquínez Ferrándiz
b7052e5087 Also print the field that fails if it is a md5 checksum 2013-08-29 12:15:45 +02:00
Jaime Marquínez Ferrándiz
0d75ae2ce3 Fix detection of the webpage charset if it's declared using ' instead of "
Like in "<meta charset='utf-8'/>"
2013-08-29 11:35:15 +02:00
Jeff Smith
b5ba7b9dcf Fix MIT extractor for Python 2.6
The HTML for the MIT page does not parse cleanly for Python 2.6 due
to script tags within an actual script element.  The offending piece
is inside a comment block, so removing all such comment blocks
fixes the parsing.
2013-08-28 14:24:42 -05:00
Allan Zhou
483e0ddd4d Merge remote-tracking branch 'upstream/master' 2013-08-28 10:19:28 -07:00
Philipp Hagemeister
2891932bf0 release 2013.08.28.1 2013-08-28 19:00:17 +02:00
Allan Zhou
591078babf Merge remote-tracking branch 'upstream/master' 2013-08-28 09:57:28 -07:00
Philipp Hagemeister
9868c781a1 Merge remote-tracking branch 'origin/master' 2013-08-28 18:22:33 +02:00
Philipp Hagemeister
c257baff85 Merge remote-tracking branch 'rzhxeo/youporn-hd'
Conflicts:
	youtube_dl/utils.py
2013-08-28 18:22:28 +02:00
rzhxeo
878e83c5a4 YoupornIE: Clean up extraction of hd video 2013-08-28 16:04:48 +02:00
rzhxeo
0012690aae Let aes_decrypt_text return bytes instead of unicode 2013-08-28 16:03:35 +02:00
rzhxeo
6e74bc41ca Fix division bug in aes.py 2013-08-28 16:01:43 +02:00
rzhxeo
cba892fa1f Add intlist_to_bytes to utils.py 2013-08-28 15:59:07 +02:00
rzhxeo
550bfd4cbd Merge pull request #1 from phihag/youporn-hd-pr
Allow changes to run under Python 3
2013-08-28 06:23:33 -07:00
Jaime Marquínez Ferrándiz
920ef0779b Hide the password and username in verbose mode (closes #1089) 2013-08-28 15:14:02 +02:00
Philipp Hagemeister
48ea9cea77 Allow changes to run under Python 3 2013-08-28 14:34:49 +02:00
Philipp Hagemeister
ccf4b799df Merge remote-tracking branch 'origin/master' 2013-08-28 14:02:40 +02:00
Philipp Hagemeister
f143d86ad2 [sohu] Handle encoding, and fix tests 2013-08-28 14:00:05 +02:00
Jaime Marquínez Ferrándiz
8ae97d76ee PostProcessingError holds the message in the 'msg' property, not in 'message' (fixes #1323)
Causes DeprecationWarning: http://www.python.org/dev/peps/pep-0352/
2013-08-28 13:37:31 +02:00
Philipp Hagemeister
f8b362739e Merge remote-tracking branch 'Huarong/master' 2013-08-28 13:10:59 +02:00
Philipp Hagemeister
6d69d03bac Merge remote-tracking branch 'origin/reuse_ies' 2013-08-28 13:05:21 +02:00
Philipp Hagemeister
204da0d3e3 Merge remote-tracking branch 'origin/master' 2013-08-28 12:57:44 +02:00
Philipp Hagemeister
c496ca96e7 Fix platform name in Python 2 with --verbose (Closes #1228) 2013-08-28 12:57:10 +02:00
Jaime Marquínez Ferrándiz
67b22dd036 Add extractors for video.mit.edu and techtv.mit.edu (closes #1327)
video.mit.edu just embeds the videos from techtv.mit.edu
2013-08-28 12:55:42 +02:00
Philipp Hagemeister
ce6a696e4d Remove unused imports 2013-08-28 12:47:38 +02:00
Philipp Hagemeister
a5caba1eb0 [generic] simply use urljoin 2013-08-28 12:47:27 +02:00
Pierre Rudloff
cd9c100963 Merge remote-tracking branch 'upstream/master' 2013-08-28 12:20:12 +02:00
Philipp Hagemeister
edde6c56ac Print playpath with --get-url (Fixes #1334) 2013-08-28 12:14:45 +02:00
Pierre Rudloff
b7f89fe692 Merge remote-tracking branch 'upstream/master' 2013-08-28 12:10:34 +02:00
Philipp Hagemeister
ae3531adf9 [generic] Fix URL concatenation
When the url is something like http://example.org/foo/bar?x=y  and the added is file/video.mp4 , we want http://example.org/foo/file/video.mp4
Fixes #1268.
2013-08-28 12:08:17 +02:00
Philipp Hagemeister
8cf5ee7831 Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-28 11:57:18 +02:00
Philipp Hagemeister
aa3e950764 Tolerate junk at the end of gzip-compressed content (#1268) 2013-08-28 11:57:13 +02:00
Pierre Rudloff
1301a0dd42 Merge remote-tracking branch 'upstream/master' 2013-08-28 11:02:12 +02:00
Jaime Marquínez Ferrándiz
af8bd6a82d Show the time taken to download in the same format as the ETA 2013-08-28 10:56:11 +02:00
Jaime Marquínez Ferrándiz
6d38616e67 Merge pull request #1181 from h3xx/master
Add some verbosity when reporting finished downloads

Remove the mixed use of tabs and spaces for indentation.
2013-08-28 10:54:07 +02:00
Philipp Hagemeister
4f5f18acb9 [addanime] add file 2013-08-28 10:28:16 +02:00
Jaime Marquínez Ferrándiz
3e223834d9 [youtube] update algo for length 88, thanks to @Ramhack (fixes #1328) 2013-08-28 10:26:44 +02:00
Jaime Marquínez Ferrándiz
a1bb0f8773 [cnn] remove debug print call. 2013-08-28 10:20:37 +02:00
Jaime Marquínez Ferrándiz
0e283428f7 HTTPError is in urllib.error in Python 3, not in http.error 2013-08-28 10:18:39 +02:00
Philipp Hagemeister
2eabb80254 [addanime] improve 2013-08-28 04:25:38 +02:00
Philipp Hagemeister
44586389e4 [appletrailers] Add support 2013-08-28 02:18:44 +02:00
Ismael Mejia
06a401c845 Merge branch 'master' into subtitles_rework 2013-08-28 00:33:12 +02:00
Philipp Hagemeister
273f603efb [cnn] Allow more URLs 2013-08-28 00:14:19 +02:00
Philipp Hagemeister
1619e22f40 release 2013.08.28 2013-08-27 23:31:36 +02:00
Philipp Hagemeister
88a79ce6a6 Delete default user agent (Fixes #1309) 2013-08-27 23:31:24 +02:00
Philipp Hagemeister
acebc9cd6b Revert "Install our own HTTPS handler as well (#1309)"
This reverts commit 36399e8576 and fixes #1322.
2013-08-27 23:28:20 +02:00
Jaime Marquínez Ferrándiz
443c12a703 Merge pull request #1324 from whydoubt/fix_gplus
Initial slash in Google+ photos link was removed
2013-08-27 13:36:39 -07:00
Jeff Smith
7f3c4f4f65 Initial slash in Google+ photos link was removed 2013-08-27 14:38:50 -05:00
Jaime Marquínez Ferrándiz
0bc56fa66a Add an extractor for NBC news (closes #1320) 2013-08-27 12:38:57 +02:00
Jaime Marquínez Ferrándiz
1a582dd49d Add an extractor for CNN (closes #1318) 2013-08-27 11:56:48 +02:00
Pierre Rudloff
c5b921b597 Merge remote-tracking branch 'upstream/master' 2013-08-27 10:47:47 +02:00
Philipp Hagemeister
e86ea47c02 [canalc2] Small improvements 2013-08-27 10:35:20 +02:00
Philipp Hagemeister
aa5a63a5b5 Merge remote-tracking branch 'Rudloff/canalc2' 2013-08-27 10:31:46 +02:00
Jaime Marquínez Ferrándiz
2a7b4da9b2 [hark] get the song info in JSON and extract more information. 2013-08-27 10:25:38 +02:00
Jaime Marquínez Ferrándiz
069d098f84 [canalplus] Accept player.canalplus.fr urls 2013-08-27 10:21:57 +02:00
Philipp Hagemeister
b3889f7023 release 2013.08.27 2013-08-27 02:30:47 +02:00
Philipp Hagemeister
65883c8dbd Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-27 02:00:23 +02:00
Philipp Hagemeister
341ca8d74c [trilulilu] Add support for trilulilu.ro
Fun fact: The ads (not yet supported) are loaded from youtube ;)
2013-08-27 01:59:00 +02:00
Allan Zhou
99859d436c Merge remote-tracking branch 'upstream/master' 2013-08-26 15:16:13 -07:00
Jaime Marquínez Ferrándiz
1b01e2b085 Merge pull request #1315 from yasoob/master
fixed tests for c56 and dailymotion
2013-08-26 13:38:48 -07:00
M.Yasoob Ullah Khalid ☺
976fc7d137 fixed tests for c56 and dailymotion 2013-08-27 01:00:17 +05:00
Philipp Hagemeister
c3b7b29c23 Merge remote-tracking branch 'origin/master' 2013-08-26 21:29:44 +02:00
Philipp Hagemeister
627a91a9a8 [generic] small typo 2013-08-26 21:29:31 +02:00
Philipp Hagemeister
6dc6302599 Merge pull request #1231 from yasoob/master
Added an IE for hark.com
2013-08-26 12:29:04 -07:00
Pierre Rudloff
7a20e2e1f8 Merge remote-tracking branch 'upstream/master' 2013-08-26 03:16:42 +02:00
Philipp Hagemeister
90648143c3 Merge pull request #1310 from rzhxeo/rtlnow
Add support for http://superrtlnow.de
2013-08-25 15:45:22 -07:00
Pierre Rudloff
5c6658d4dd Merge remote-tracking branch 'upstream/master' 2013-08-24 23:01:39 +02:00
Philipp Hagemeister
9585f890f8 [generic] add support for relative URLs (Fixes #1308) 2013-08-24 22:56:37 +02:00
Philipp Hagemeister
0838239e8e [generic] Support double slash URLs (Fixes #1309) 2013-08-24 22:52:45 +02:00
Philipp Hagemeister
36399e8576 Install our own HTTPS handler as well (#1309) 2013-08-24 22:49:22 +02:00
Philipp Hagemeister
9460db832c [ro220] Add support for 220.ro 2013-08-24 21:10:03 +02:00
rzhxeo
d68730a56e Add SUPER RTL NOW to RTLnow extractor 2013-08-24 13:22:28 +02:00
Jaime Marquínez Ferrándiz
f2aeefe29c [youtube] update algo for length 84 2013-08-24 10:48:12 +02:00
Allan Zhou
39c6f507df Merge remote-tracking branch 'upstream/master' 2013-08-23 15:33:36 -07:00
Philipp Hagemeister
d2d1eb5b0a Switch to domain yt-dl.org 2013-08-23 23:57:23 +02:00
Philipp Hagemeister
8ae7be3ef4 release 2013.08.23 2013-08-23 23:09:53 +02:00
Jaime Marquínez Ferrándiz
306170518f [youtube] update algo for length 86 (fixes #1302) 2013-08-23 22:36:59 +02:00
Jaime Marquínez Ferrándiz
aa6a10c44a Allow to specify multiple subtitles languages separated by commas (closes #518) 2013-08-23 18:34:57 +02:00
Jaime Marquínez Ferrándiz
9af73dc4fc Print a message before embedding the subtitles 2013-08-23 18:17:43 +02:00
Jaime Marquínez Ferrándiz
fc483bb6af [xhamster] use determine_ext 2013-08-23 17:23:34 +02:00
Jaime Marquínez Ferrándiz
53b0f3e4e2 Merge pull request #1301 from rzhxeo/xhamster
XHamsterIE: Fix video extension and add video description
2013-08-23 17:21:30 +02:00
rzhxeo
4353cf51a0 XHamsterIE: Add video description 2013-08-23 16:40:20 +02:00
rzhxeo
ce34e9ce5e XHamsterIE: Fix video extension
Cut off GET parameter
2013-08-23 16:33:41 +02:00
Jaime Marquínez Ferrándiz
d4051a8e05 Add a post processor for embedding subtitles in mp4 videos (closes #1052) 2013-08-23 15:06:19 +02:00
Jaime Marquínez Ferrándiz
df3df7fb64 [youtube] Fix download of subtitles with '--all-subs'
If _extract_subtitles is called the option 'write subtitles' is always true.
2013-08-23 13:14:22 +02:00
Jaime Marquínez Ferrándiz
9e9c164052 Merge pull request #937 from jaimeMF/subtitles_rework
Subtitles rework
2013-08-23 02:40:25 -07:00
Jaime Marquínez Ferrándiz
066090dd3f [youtube] add algo for length 80 and update player info 2013-08-23 11:33:56 +02:00
Allan Zhou
614d9c19c1 Merge remote-tracking branch 'upstream/master' 2013-08-22 17:02:41 -07:00
Ismael Mejia
bd2dee6c67 Merge branch 'master' into subtitles_rework 2013-08-23 01:47:10 +02:00
Philipp Hagemeister
74e6672beb Merge pull request #1297 from iemejia/master
[subtitles] separated subtitle options in their own group
2013-08-22 16:30:14 -07:00
Philipp Hagemeister
02bcf0d389 release 2013.08.22 2013-08-22 23:29:42 +02:00
Ismael Mejia
18b4e04f1c Merge branch 'master' into subtitles_rework 2013-08-22 23:29:36 +02:00
Jaime Marquínez Ferrándiz
10204dc898 [videofyme] Add an additional quality (they change between downloads of the info) and update md5 sum of the test video 2013-08-22 23:23:52 +02:00
Ismael Mejia
1865ed31b9 [subtitles] separated subtitle options in their own group 2013-08-22 22:44:04 +02:00
Jaime Marquínez Ferrándiz
3669cdba10 [youtube] update algo for length 82 (fixes #1296) 2013-08-22 22:35:15 +02:00
Jaime Marquínez Ferrándiz
939fbd26ac [youtube] fix the order of DASH formats 2013-08-22 19:45:24 +02:00
Allan Zhou
b4e60dac23 Merge remote-tracking branch 'upstream/master' 2013-08-22 10:43:51 -07:00
Jaime Marquínez Ferrándiz
e6ddb4e7af Merge pull request #1279 from xanadu/master
Add YouTube DASH formats to YouTubeIE
2013-08-22 19:33:34 +02:00
Jaime Marquínez Ferrándiz
83390b83d9 Merge pull request #1266 from MiLk/py-generator
Update the youtube algorithm generator
2013-08-22 10:18:58 -07:00
Pierre Rudloff
ff2424595a lxml is not part of the standard library. 2013-08-22 14:47:51 +02:00
Pierre Rudloff
adeb9c73d6 Merge remote-tracking branch 'upstream/master' 2013-08-22 14:04:30 +02:00
Pierre Rudloff
cd0abcc0bb Extractor for canalc2.tv 2013-08-22 13:54:23 +02:00
Jaime Marquínez Ferrándiz
4a55479fa9 Credit Pierre Rudloff for JeuxVideoIE and UnistraIE 2013-08-22 13:21:32 +02:00
Jaime Marquínez Ferrándiz
f527115b5f Rename utv.py to unistra.py and extract more info
There are other sites that could be named utv, which would conflict if they are added
2013-08-22 13:19:35 +02:00
Pierre Rudloff
75e1b46add Download from utv.unistra.fr (PR #1271)
Squashed to a single commit to keep the file 'youtube-dl' unchanged and remove the revert commit.
2013-08-22 12:58:12 +02:00
Pierre Rudloff
05a2926c5c Merge remote-tracking branch 'upstream/master' 2013-08-22 12:55:58 +02:00
Pierre Rudloff
7070b83687 Merge remote-tracking branch 'upstream/master' 2013-08-22 12:54:17 +02:00
Pierre Rudloff
8d212e604a Merge remote-tracking branch 'upstream/master'
Conflicts:
	youtube_dl/extractor/jeuxvideo.py
2013-08-22 12:52:05 +02:00
Jaime Marquínez Ferrándiz
063fcc9676 [jeuxvideo] Extract more information and add test 2013-08-22 12:37:34 +02:00
Jaime Marquínez Ferrándiz
8403612258 Merge pull request #1267 from Rudloff/master
Download videos from jeuxvideo.com

Edited to keep the file 'youtube-dl' unchanged.
2013-08-22 12:25:21 +02:00
Pierre Rudloff
25b51c7816 Download videos from jeuxvideo.com 2013-08-22 12:12:34 +02:00
Jaime Marquínez Ferrándiz
9779b63bb6 Add an extractor for PBS (closes #870 and #873) 2013-08-22 11:57:21 +02:00
Jaime Marquínez Ferrándiz
d81aef3adf Add an extractor for tv.slashdot.org (closes #1192)
It uses the ooyala platform, so it just extracts the ooyala url.
2013-08-21 21:51:58 +02:00
Allan Zhou
5af7e056a7 Merge remote-tracking branch 'upstream/master' 2013-08-21 10:53:42 -07:00
Jaime Marquínez Ferrándiz
45ed795cb0 [youtube] update uploader name for a test video: 'IconaPop' has changed to 'Icona Pop' 2013-08-21 19:28:48 +02:00
Jaime Marquínez Ferrándiz
683e98a8a4 [statigram] change test video
The old one cannot be accessed.
2013-08-21 19:20:27 +02:00
Jaime Marquínez Ferrándiz
e0cfeb2ea7 [funnyordie] fix extraction of video url and title 2013-08-21 18:58:25 +02:00
Jaime Marquínez Ferrándiz
75340ee383 [vevo] Fix urls with a query (#1258) 2013-08-21 18:20:03 +02:00
Jaime Marquínez Ferrándiz
668de34c6b [soundcloud] Support widget urls (fixes #1252) 2013-08-21 17:06:37 +02:00
Jaime Marquínez Ferrándiz
a91b954bb4 [vimeo] extract information for Vimeo Pro videos from http://player.vimeo.com/video/{video_id} (fixes #1197)
For some videos https://vimeo.com/{video_id} doesn't work
2013-08-21 13:48:19 +02:00
Allan Zhou
a3f62b8255 Merge remote-tracking branch 'upstream/master' 2013-08-21 00:07:03 -07:00
Allan Zhou
37b6d5f684 fix hls test 2013-08-20 23:51:05 -07:00
Allan Zhou
b7a6838407 address review comment 2013-08-20 21:57:32 -07:00
Allan Zhou
cde846b3d3 fix code style 2013-08-20 21:42:49 -07:00
Philipp Hagemeister
6c3e6e88d3 Allow hours in ETA display (Fixes #1280) 2013-08-21 05:44:19 +02:00
Philipp Hagemeister
739674cd77 [rtlnow] Add support for error message for queries from outside of Germany 2013-08-21 05:24:58 +02:00
Philipp Hagemeister
4b2d7cae11 release 2013.08.21 2013-08-21 04:33:57 +02:00
Philipp Hagemeister
7fea7156cb [generic] support HTML5 video 2013-08-21 04:32:22 +02:00
Philipp Hagemeister
3093468977 [generic] Ignore stupid HTTP servers (#1284) 2013-08-21 04:32:07 +02:00
Philipp Hagemeister
79cb25776f Cache suitable regular expressions
This speeds up TestAllURLsMatching.test_no_duplicates by about 8000% at the cost of minimal memory overhead.
2013-08-21 04:06:48 +02:00
Philipp Hagemeister
87f78946a5 [collegehumor] Allow old-style videos (Fixes #1285) 2013-08-21 03:50:56 +02:00
Allan Zhou
211fbc1328 fix failed tests 2013-08-19 18:57:55 -07:00
Allan Zhou
836a086ce9 Add YouTube DASH formats to YouTubeIE 2013-08-19 18:22:25 -07:00
Allan Zhou
90d3989b99 Merge remote-tracking branch 'upstream/master' 2013-08-19 17:11:52 -07:00
Philipp Hagemeister
d741e55a42 [youtube] Support watch_popup URLs (Fixes #1275) 2013-08-19 10:27:42 +02:00
Philipp Hagemeister
17d3aaaf16 Merge pull request #1273 from rzhxeo/rtlnow
Add support for http://voxnow.de
2013-08-19 00:19:06 -07:00
rzhxeo
ea55b2a4ca Add VOXnow to RTLnow extractor 2013-08-19 08:57:36 +02:00
Philipp Hagemeister
3f0537dd4a Merge remote-tracking branch 'rzhxeo/rtlnow' 2013-08-19 00:25:34 +02:00
Pierre Rudloff
943f7f7a39 Download videos from jeuxvideo.com 2013-08-18 16:11:47 +02:00
Emilien Kenler
12e895fc5a Merge branch 'master' into py-generator 2013-08-18 11:12:38 +02:00
Emilien Kenler
bda2c49d75 Update algo - see #1254
Signed-off-by: Emilien Kenler <hello@emilienkenler.com>
2013-08-18 11:10:39 +02:00
rzhxeo
01b32990da Add RTLnow extractor 2013-08-18 08:16:53 +02:00
rzhxeo
dbda1b5147 Add RTLnow extractor
Supports http://rtl2now.rtl2.de and http://rtl-now.rtl.de
2013-08-18 08:15:18 +02:00
Philipp Hagemeister
ddf3bd328b release 2013.08.17 2013-08-17 08:33:36 +02:00
Philipp Hagemeister
b9c37b92cf Merge pull request #1256 from patrickslin/patch-5
Length 85 changed again? (fixes #1254)
2013-08-16 14:07:49 -07:00
kkalpakloglou
5a27ecdd2e Update AddAnime.py 2013-08-16 23:54:09 +03:00
patrickslin
f9c3c90ca8 Length 85 changed again? (fixes #1254) 2013-08-16 08:54:01 -07:00
Philipp Hagemeister
6daccbe317 release 2013.08.15 2013-08-15 22:40:00 +02:00
Philipp Hagemeister
71ea844c0e Merge pull request #1248 from patrickslin/patch-4
Unable to Download Video (fixes #1247)
2013-08-15 13:38:32 -07:00
patrickslin
3a7256697e Unable to Download Video (fixes #1247) 2013-08-15 13:00:20 -07:00
Philipp Hagemeister
d1ba998274 release 2013.08.14 2013-08-14 10:19:53 +02:00
Philipp Hagemeister
718ced8d8c Merge pull request #1239 from patrickslin/patch-3
Updated Vevo Signature Length (fixes #1237)
2013-08-14 01:18:58 -07:00
patrickslin
e1842025d0 Updated Vevo Signature Length (fixes #1237) 2013-08-13 17:57:35 -07:00
Emilien Kenler
2b9213cdc1 Update generator
Signed-off-by: Emilien Kenler <hello@emilienkenler.com>
2013-08-12 10:48:40 +02:00
Yasoob
e3a88568b0 Added an IE for hark.com 2013-08-11 22:23:05 +05:00
Philipp Hagemeister
0577177e3e [vevo] fix testcase 2013-08-11 07:12:38 +02:00
Philipp Hagemeister
298f833b16 Note update possibility on errors (thanks @chbrown, #1229) 2013-08-11 06:46:24 +02:00
rzhxeo
97b3656c2e YoupornIE: Add support for hd videos and update Test 2013-08-09 18:37:33 +02:00
rzhxeo
f3bcebb1d2 add an aes implementation 2013-08-09 18:36:01 +02:00
Philipp Hagemeister
0f399e6e5e release 2013.08.09 2013-08-09 15:49:09 +02:00
Philipp Hagemeister
5b075e27cb Merge pull request #1218 from patrickslin/patch-2
New sig len 89 algo
2013-08-09 03:42:13 -07:00
patrickslin
8a9d86a2a7 New sig len 89 algo
Fixes new YT encrypted sig len 89.
2013-08-08 21:48:12 -07:00
Ismael Mejia
d80a064eff [subtitles] Added tests to check correct behavior when no subtitles are
available
2013-08-08 22:22:33 +02:00
Philipp Hagemeister
d468a09789 release 2013.08.08.1 2013-08-08 20:45:16 +02:00
Philipp Hagemeister
9f4ab73d7f Merge pull request #1216 from patrickslin/patch-5
Invalid signature again (fixes #1215)
2013-08-08 11:44:29 -07:00
patrickslin
02cf62e240 Invalid signature again (fixes #1215) 2013-08-08 11:28:50 -07:00
Ismael Mejia
d55de6eec2 [subtitles] Skips now the subtitles that has already been downloaded.
Just a validation for file exists, I also removed a method that wasn't
been used because it was a copy paste from FileDownloader.
2013-08-08 18:30:04 +02:00
Ismael Mejia
69df680b97 [subtitles] Improved docs + new class for servers who don't support
auto-caption
2013-08-08 11:20:56 +02:00
Ismael Mejia
447591e1ae [test] Cleaned subtitles tests 2013-08-08 11:03:52 +02:00
Ismael Mejia
33eb0ce4c4 [subtitles] removed only-sub option (--skip-download achieves the same
functionality)
2013-08-08 10:06:24 +02:00
Ismael Mejia
505c28aac9 Separated subtitle options in their own group 2013-08-08 09:53:25 +02:00
Philipp Hagemeister
67fb0c5495 Merge branch 'master' of github.com:rg3/youtube-dl 2013-08-08 08:56:59 +02:00
Philipp Hagemeister
4efba05c56 Clarify template error message (#1209) 2013-08-08 08:55:26 +02:00
Ismael Mejia
8377574c9c [internal] Improved subtitle architecture + (update in
youtube/dailymotion)

The structure of subtitles was refined, you only need to implement one
method that returns a dictionnary of the available subtitles (lang, url) to
support all the subtitle options in a website. I updated the subtitle
downloaders for youtube/dailymotion to show how it works.
2013-08-08 08:54:10 +02:00
Philipp Hagemeister
0f90943e45 Merge pull request #1189 from cyisfor/master
More informative error
2013-08-07 17:30:28 -07:00
Philipp Hagemeister
526e638c8a release 2013.08.08 2013-08-08 00:39:23 +02:00
Ismael Mejia
372297e713 Undo the previous commit (it was a mistake) 2013-08-07 21:24:42 +02:00
Philipp Hagemeister
356e067390 Merge remote-tracking branch 'patrickslin/patch-4' 2013-08-07 20:19:51 +02:00
Philipp Hagemeister
e2f48f9643 Remove youtube sig tests
The signature algo changes too often for the static test to make sense.
2013-08-07 20:11:40 +02:00
Philipp Hagemeister
b513a251f8 Merge commit '7a4c6cc92f9ffec9135652a49153caffa5520c29' 2013-08-07 20:11:04 +02:00
Ismael Mejia
953e32b2c1 [dailymotion] Added support for subtitles + new InfoExtractor for
generic subtitle download.

The idea is that all subtitle downloaders must descend from SubtitlesIE
and implement only three basic methods to achieve the complete subtitle
download functionality. This will allow to reduce the code in YoutubeIE
once it is rewritten.
2013-08-07 18:59:11 +02:00
Ismael Mejia
5898e28272 Fixed small type issue 2013-08-07 18:48:24 +02:00
Ismael Mejia
67dfbc0cb9 Added exceptions for the subtitle and video types in .gitignore 2013-08-07 18:42:40 +02:00
patrickslin
36cb11f068 Encrypted sig 87 broken again (fixes #1200) 2013-08-06 21:35:37 -07:00
AndreiArba
7a4c6cc92f Updated the 84 length signature decryption
Updated the right 84 length signature decryption 06.08.2013
2013-08-06 15:41:13 +03:00
user
7edcb8f39c More informative error 2013-08-05 19:43:09 -07:00
huohuarong
d5b00ee6e0 improve sohu extractor 2013-08-06 10:26:57 +08:00
tsantala
461cead4f7 changes 2013-08-06 04:34:24 +03:00
huohuarong
b5a6d40818 fix parse title bug 2013-08-05 22:51:54 +08:00
Dan Church
968b5e0112 Add some verbosity when reporting finished downloads
For example:

    [download] Resuming download at byte 1868140
    [download] Destination: Entry #1-Bn59FJ4HrmU.flv
    [download] 100% of 3.27MiB in 4s

This format is meant to somewhat mirror the behavior of wget(1) when reporting finished downloads:

    100%[==================>] 54,836,682   788KB/s   in 74s

    2013-08-04 12:32:05 (728 KB/s) - 'google-chrome-stable_current_x86_64.rpm' saved [54836682/54836682]
2013-08-04 12:45:24 -05:00
Jaime Marquínez Ferrándiz
39b782b390 [collegehumor] support urls in the format www.collegehumor.com/e/{video_id} (fixes #1179) 2013-08-04 16:36:48 +02:00
Jaime Marquínez Ferrándiz
577664c8e8 Add an extractor from muzu.tv (closes #1177) 2013-08-04 11:10:57 +02:00
Jaime Marquínez Ferrándiz
bba12cec89 Add an extractor for videofy.me (closes #1171)
Also modify find_xpath_attr to accept values with spaces like for id="HQ on"
2013-08-03 22:50:27 +02:00
Jaime Marquínez Ferrándiz
70c4c03cb8 [arte] add support for downloading from http://liveweb.arte.tv (fixes #1014) 2013-08-03 19:07:04 +02:00
Jaime Marquínez Ferrándiz
f5791ed136 [arte] Prefer vídeos without subtitles in the same language (fixes #1173) and fix crash when there's no description 2013-08-03 17:32:29 +02:00
huohuarong
4ec929dc9b use ..utils/clean_html() 2013-08-03 10:29:58 +08:00
Jaime Marquínez Ferrándiz
fbf189a6ee [myvideo] add support for videos that place the video info inside www.myvideo.de/service/data/video/{id}/config (fixes #616) 2013-08-02 21:09:17 +02:00
Jaime Marquínez Ferrándiz
09825cb5c0 Add an extractor for Ooyala (closes #833)
Only works for some sites, it doesn't work for videos that use a f4m manifest
2013-08-02 16:53:16 +02:00
Jaime Marquínez Ferrándiz
ed27d35674 [youtube] don't crash in verbose mode if 'ad3_module' is not defined in age protected videos (fixes #1159) 2013-08-02 14:17:01 +02:00
Philipp Hagemeister
fd5539eb41 release 2013.08.02 2013-08-02 13:35:13 +02:00
Jaime Marquínez Ferrándiz
04bca64bde [youtube]: new algo for length 83 (fixes #1164) 2013-08-02 12:38:17 +02:00
Jaime Marquínez Ferrándiz
03cc7c20c1 [youtube] show which formats are in 3D with "-F" and in the format field 2013-08-02 12:21:28 +02:00
Jaime Marquínez Ferrándiz
4075311d94 Merge pull request #1163 from xanadu/master
add support for download YouTube 3d format of 3d content
2013-08-02 12:06:34 +02:00
huohuarong
6624a2b07d add an extractor for tv.sohu.com 2013-08-02 17:58:46 +08:00
huohuarong
6d3a7d03e1 fix bug: kankan extractor not support http://vod.kankan.com/v/70/70309.shtml 2013-08-02 15:26:11 +08:00
Allan Zhou
95fdc7d69c Merge remote-tracking branch 'upstream/master' 2013-08-01 10:57:12 -07:00
Allan Zhou
86fe61c8f9 add support for download YouTube 3d format of 3d content 2013-08-01 10:47:48 -07:00
Jaime Marquínez Ferrándiz
9bb6d2f21d Merge pull request #1161 from meyerd/master
Fix regex error when only subtitled video is available on arte.
2013-08-01 04:49:05 -07:00
Dominik
e3f4593e76 Fix regex error when only subtitled video is available on arte. 2013-08-01 11:48:17 +02:00
Jaime Marquínez Ferrándiz
1d043b93cf [youtube] Add support for downloading videos with hlsvp (fixes #1083)
They are downloaded with a m3u8 manifest, they seem to be encrypted, but ffmpeg can handle them.
2013-07-31 23:41:05 +02:00
Jaime Marquínez Ferrándiz
b15d4f624f Allow to download from m3u8 manifests with ffmpeg
They are detected by the extension of the url.
2013-07-31 22:33:37 +02:00
Jaime Marquínez Ferrándiz
4aa16a50f5 Log a better error message if ffprobe or avconv are not found (related #1134) 2013-07-31 21:22:08 +02:00
Jaime Marquínez Ferrándiz
bbcbf4d459 Switch some calls to to_stderr to report_error and report_warning 2013-07-31 21:20:46 +02:00
Philipp Hagemeister
930ad9eecc release 2013.07.31 2013-07-31 10:55:02 +02:00
Jaime Marquínez Ferrándiz
b072a9defd YoutubeIE: with age protected videos, add a missing "return" to return the signature decrypted with _decrypt_signature 2013-07-31 10:51:00 +02:00
Jaime Marquínez Ferrándiz
75952c6e3d YoutubeIE: new algo for length 86 (fixes #1156)
Now is using the same length as the flash player used for age protected videos, but the algorithm is different, so now for age protected videos it first tries to use the old algo.
2013-07-31 10:45:13 +02:00
Jaime Marquínez Ferrándiz
05afc96b73 Print urls from the batch file with --verbose (related #1155) 2013-07-30 23:11:44 +02:00
Jaime Marquínez Ferrándiz
fa80026915 Disable way and tf1 tests, the whole videos are served sometimes, so the md5 sum doesn't match. 2013-07-30 11:19:07 +02:00
Jaime Marquínez Ferrándiz
2bc3de0f28 [worldstarhiphop] Small cleanup
The second check for the Vevo id is not necessary.
2013-07-30 11:10:17 +02:00
Jaime Marquínez Ferrándiz
99c7bc94af Merge pull request #1148 from JohnyMoSwag/master
[worldstarhiphop] support vevo videos
2013-07-30 11:05:40 +02:00
Jaime Marquínez Ferrándiz
152c8f349d Merge pull request #1149 from pishposhmcgee/patch-3
[vevo] Modified m_urls regex and video_url
2013-07-30 01:57:37 -07:00
Johny Mo Swag
d75654c15e using re.search 2013-07-29 14:39:14 -07:00
Jaime Marquínez Ferrándiz
0725f584e1 [wat] fix the extraction of the video url (fixes #1103)
Use the direct download link for Android.
2013-07-29 23:38:02 +02:00
Jaime Marquínez Ferrándiz
8cda9241d1 Add an extractor for kankan.com (closes #1133) 2013-07-29 23:13:12 +02:00
pishposhmcgee
a3124ba49f Modified m_urls regex and video_url
Some videos have a leading slash, some do not
2013-07-29 15:45:20 -05:00
Johny Mo Swag
579e2691fe detect vevo embed fix 2013-07-29 12:24:26 -07:00
Johny Mo Swag
63f05de10b detect vevo embed 2013-07-29 12:11:57 -07:00
Jaime Marquínez Ferrándiz
caeefc29eb [vimeo] add an extractor for channels 2013-07-29 13:12:09 +02:00
Jaime Marquínez Ferrándiz
a3c736def2 [dailymotion] Add an extractor for Dailymotion playlists 2013-07-29 12:07:38 +02:00
Jaime Marquínez Ferrándiz
58261235f0 Add an extractor for roxwell.com (closes #1044) 2013-07-26 13:00:59 +02:00
Philipp Hagemeister
da70877a1b release 2013.07.25.2 2013-07-25 22:58:40 +02:00
Jaime Marquínez Ferrándiz
5c468ca8a8 YoutubeIE: add algo for length 79 (fixes #1126) 2013-07-25 22:50:24 +02:00
Jaime Marquínez Ferrándiz
aedd6bb97d YoutubeIE: new algo for length 81 (fixes #1127) 2013-07-25 22:06:53 +02:00
Jaime Marquínez Ferrándiz
733d9cacb8 Merge pull request #1120 from pishposhmcgee/patch-1
[collegehumor] Added an option 'e' to go with 'video' or 'embed'
2013-07-25 01:14:43 -07:00
Philipp Hagemeister
42f2805e48 [keek] Fix testcase (Broken by accident in 6625f82940) 2013-07-25 10:10:37 +02:00
Philipp Hagemeister
0ffcb7c6fc release 2013.07.25.1 2013-07-25 09:53:15 +02:00
Philipp Hagemeister
27669bd11d [ina] Allow I at start of video IDs 2013-07-25 09:52:58 +02:00
Philipp Hagemeister
6625f82940 [keek] Allow httpS URLs (Fixes #1123) 2013-07-25 09:40:19 +02:00
Philipp Hagemeister
d0866f0bb4 release 2013.07.25 2013-07-25 09:35:25 +02:00
Philipp Hagemeister
09eeb75130 Merge remote-tracking branch 'pishposhmcgee/patch-2' 2013-07-25 09:34:56 +02:00
Philipp Hagemeister
0a99956f71 [ina] Fix URL detection (Fixes #1121) 2013-07-25 09:34:12 +02:00
pishposhmcgee
12ef6aefa8 changed video_url regex
Some older videos contain an extra properties such as 'embed' before 'type'.
2013-07-24 21:51:08 -05:00
pishposhmcgee
e93aa81aa6 Added an option 'e' to go with 'video' or 'embed'
Based on links that I've seen, /e/<videoid> also occurs in the wild, and making this substitution yields effective results.
2013-07-24 16:55:28 -05:00
Jaime Marquínez Ferrándiz
755eb0320e [youtube] use itertools.count instead of a "while True" loop and a manual counter 2013-07-24 22:27:33 +02:00
Jaime Marquínez Ferrándiz
43ba5456b1 [youtube] add an extractor for the "Watch Later" list 2013-07-24 22:13:39 +02:00
Philipp Hagemeister
156d5ad6da release 2013.07.24.2 2013-07-24 21:18:41 +02:00
Jaime Marquínez Ferrándiz
c626a3d9fa Add an extractor for downloading the Youtube favorite videos(closes #127) 2013-07-24 20:45:19 +02:00
Jaime Marquínez Ferrándiz
b2e8bc1b20 YoutubeIE: Move the code from _real_initialize to a base class
This allows to reuse the code in other IEs without having to overwrite some parts.
2013-07-24 20:40:12 +02:00
Jaime Marquínez Ferrándiz
771822ebb8 YoutubePlaylistIE: break only if there's no entry field in the response
Otherwise the Favorite videos playlist cannot be downloaded complete.
Also break if it reach the maximum value of the start-index.
2013-07-24 20:14:55 +02:00
Jaime Marquínez Ferrándiz
eb6a41ba0f ExfmIE: extract Soundcloud songs using SoundcloudIE
Now SouncloudIE accepts api urls.
2013-07-24 14:39:21 +02:00
Jaime Marquínez Ferrándiz
7d2392691c [soundcloud]: Some improvements
Extract thumbnails.
Make SoundcloudSetIE a subclass of SoundcloudIE to reuse some code.
Directly extract the file url without downloading an extra page.
2013-07-24 14:15:12 +02:00
Philipp Hagemeister
c216c1894d release 2013.07.24.1 2013-07-24 13:52:55 +02:00
Jaime Marquínez Ferrándiz
3e1ad508eb Add Youtube player info for length 87 2013-07-24 12:48:25 +02:00
Jaime Marquínez Ferrándiz
a052c1d785 Merge pull request #1114 from alexvh/traileraddict_hd
[traileraddict] Obtain hd quality stream if available

Updated md5 checksum of the test video.
2013-07-24 10:52:24 +02:00
Jaime Marquínez Ferrándiz
16484d4923 [traileraddict]: Support clips urls and more trailer urls 2013-07-24 10:43:44 +02:00
Jaime Marquínez Ferrándiz
32a09b4382 Merge pull request #1113 from alexvh/master
[traileraddict] Allow all types of trailer URLs
2013-07-24 10:37:52 +02:00
Philipp Hagemeister
870a7e6156 release 2013.07.24 2013-07-24 10:29:34 +02:00
patrickslin
239e3e0cca YoutubeIE: new algo for length 87 (fixes #1105)
Squashed commit from the pull requests #1107, #1109 and #1110.
2013-07-24 10:20:52 +02:00
Alex Van't Hof
b1ca5e3ffa [traileraddict] Obtain hd quality stream if available
No clear method for determining if hd is available so opted to just
check for presence of hd toggle function.
2013-07-24 02:42:32 -04:00
Alex Van't Hof
b9a1252c96 [traileraddict] Allow all types of trailer URLs
Valid url regex for traileraddict.com is too strict. Need to allow,
e.g. theatrical-trailer, teaser-trailer, feature-read-band-trailer, etc.
2013-07-24 00:48:11 -04:00
Jaime Marquínez Ferrándiz
6804038d06 Don't try to write the subtitles if it's None 2013-07-20 12:59:47 +02:00
Jaime Marquínez Ferrándiz
2f799533ae YoutubeIE: don't crash when trying to get automatic captions if the videos has standard subtitles. 2013-07-20 12:56:10 +02:00
Jaime Marquínez Ferrándiz
88ae5991cd YoutubeIE: use the same function for getting the subtitles for the "--write-sub" and "--all-sub" options 2013-07-20 12:56:06 +02:00
Jaime Marquínez Ferrándiz
5d51a883c2 Use a dictionary for storing the subtitles
The errors while getting the subtitles are reported as warnings, if no subtitles are found return and empty dict.
2013-07-20 12:52:25 +02:00
Jaime Marquínez Ferrándiz
c4a91be726 Save subtitles using the same code for all the options 2013-07-20 12:52:24 +02:00
alphapapa
0025da15cf Clarify that download rate is in bytes per second
I found f918ec7ea2 but it is still not clear to anyone who hasn't read Issue #723 whether the limit is in bits or bytes.  This is doubly confusing because 1) ISPs usually advertise speeds in bits per second, and 2) lowercase "k" and "m" are often used in correlation with bits rather than bytes.
2013-07-13 16:42:16 -05:00
Jaime Marquínez Ferrándiz
56c7366547 YoutubeIE: reuse instances of InfoExtractors (closes #998)
When a IE is added to the list, it's also added to a dictionary. When a IE is requested it first looks in the dictionary and if there's no instance it will create a new one.

That way _real_initialize is only called once for each IE, saving time if it needs to login for example.
2013-07-08 15:14:27 +02:00
129 changed files with 7807 additions and 1631 deletions

10
.gitignore vendored
View File

@@ -17,4 +17,12 @@ youtube-dl.tar.gz
.coverage
cover/
updates_key.pem
*.egg-info
*.egg-info
*.srt
*.sbv
*.vtt
*.flv
*.mp4
*.part
test/testdata
.tox

View File

@@ -19,7 +19,8 @@ which means you can modify it, redistribute it or use it however you like.
-U, --update update this program to latest version. Make sure
that you have sufficient permissions (run with
sudo if needed)
-i, --ignore-errors continue on download errors
-i, --ignore-errors continue on download errors, for example to to
skip unavailable videos in a playlist
--dump-user-agent display the current browser identification
--user-agent UA specify a custom user agent
--referer REF specify a custom referer, use if the video access
@@ -29,6 +30,11 @@ which means you can modify it, redistribute it or use it however you like.
--extractor-descriptions Output descriptions of all supported extractors
--proxy URL Use the specified HTTP/HTTPS proxy
--no-check-certificate Suppress HTTPS certificate validation.
--cache-dir None Location in the filesystem where youtube-dl can
store downloaded information permanently. By
default $XDG_CACHE_HOME/youtube-dl or ~/.cache
/youtube-dl .
--no-cache-dir Disable filesystem caching
## Video Selection:
--playlist-start NUMBER playlist video to start at (default is 1)
@@ -45,11 +51,16 @@ which means you can modify it, redistribute it or use it however you like.
--date DATE download only videos uploaded in this date
--datebefore DATE download only videos uploaded before this date
--dateafter DATE download only videos uploaded after this date
--no-playlist download only the currently playing video
--age-limit YEARS download only videos suitable for the given age
--download-archive FILE Download only videos not present in the archive
file. Record all downloaded videos in it.
## Download Options:
-r, --rate-limit LIMIT maximum download rate (e.g. 50k or 44.6m)
-r, --rate-limit LIMIT maximum download rate in bytes per second (e.g.
50K or 4.2M)
-R, --retries RETRIES number of retries (default is 10)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16k)
--buffer-size SIZE size of download buffer (e.g. 1024 or 16K)
(default is 1024)
--no-resize-buffer do not automatically adjust the buffer size. By
default, the buffer size is automatically resized
@@ -90,6 +101,7 @@ which means you can modify it, redistribute it or use it however you like.
file modification time
--write-description write video description to a .description file
--write-info-json write video metadata to a .info.json file
--write-annotations write video annotations to a .annotation file
--write-thumbnail write thumbnail image to disk
## Verbosity / Simulation Options:
@@ -113,25 +125,26 @@ which means you can modify it, redistribute it or use it however you like.
## Video Format Options:
-f, --format FORMAT video format code, specifiy the order of
preference using slashes: "-f 22/17/18"
preference using slashes: "-f 22/17/18". "-f mp4"
and "-f flv" are also supported
--all-formats download all available video formats
--prefer-free-formats prefer free video formats unless a specific one
is requested
--max-quality FORMAT highest quality format to download
-F, --list-formats list all available formats (currently youtube
only)
--write-sub write subtitle file (currently youtube only)
--write-auto-sub write automatic subtitle file (currently youtube
only)
--only-sub [deprecated] alias of --skip-download
## Subtitle Options:
--write-sub write subtitle file
--write-auto-sub write automatic subtitle file (youtube only)
--all-subs downloads all the available subtitles of the
video (currently youtube only)
video
--list-subs lists all available subtitles for the video
(currently youtube only)
--sub-format FORMAT subtitle format [srt/sbv/vtt] (default=srt)
(currently youtube only)
--sub-lang LANG language of the subtitles to download (optional)
use IETF language tags like 'en'
--sub-format FORMAT subtitle format (default=srt) ([sbv/vtt] youtube
only)
--sub-lang LANGS languages of the subtitles to download (optional)
separated by commas, use IETF language tags like
'en,pt'
## Authentication Options:
-u, --username USERNAME account username
@@ -153,6 +166,9 @@ which means you can modify it, redistribute it or use it however you like.
processing; the video is erased by default
--no-post-overwrites do not overwrite post-processed files; the post-
processed files are overwritten by default
--embed-subs embed subtitles in the video (only for mp4
videos)
--add-metadata add metadata to the files
# CONFIGURATION

View File

@@ -4,8 +4,12 @@ __youtube-dl()
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
opts="{{flags}}"
keywords=":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater"
if [[ ${cur} == * ]] ; then
if [[ ${cur} =~ : ]]; then
COMPREPLY=( $(compgen -W "${keywords}" -- ${cur}) )
return 0
elif [[ ${cur} == * ]] ; then
COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
return 0
fi

405
devscripts/buildserver.py Normal file
View File

@@ -0,0 +1,405 @@
#!/usr/bin/python3
from http.server import HTTPServer, BaseHTTPRequestHandler
from socketserver import ThreadingMixIn
import argparse
import ctypes
import functools
import sys
import threading
import traceback
import os.path
class BuildHTTPServer(ThreadingMixIn, HTTPServer):
allow_reuse_address = True
advapi32 = ctypes.windll.advapi32
SC_MANAGER_ALL_ACCESS = 0xf003f
SC_MANAGER_CREATE_SERVICE = 0x02
SERVICE_WIN32_OWN_PROCESS = 0x10
SERVICE_AUTO_START = 0x2
SERVICE_ERROR_NORMAL = 0x1
DELETE = 0x00010000
SERVICE_STATUS_START_PENDING = 0x00000002
SERVICE_STATUS_RUNNING = 0x00000004
SERVICE_ACCEPT_STOP = 0x1
SVCNAME = 'youtubedl_builder'
LPTSTR = ctypes.c_wchar_p
START_CALLBACK = ctypes.WINFUNCTYPE(None, ctypes.c_int, ctypes.POINTER(LPTSTR))
class SERVICE_TABLE_ENTRY(ctypes.Structure):
_fields_ = [
('lpServiceName', LPTSTR),
('lpServiceProc', START_CALLBACK)
]
HandlerEx = ctypes.WINFUNCTYPE(
ctypes.c_int, # return
ctypes.c_int, # dwControl
ctypes.c_int, # dwEventType
ctypes.c_void_p, # lpEventData,
ctypes.c_void_p, # lpContext,
)
def _ctypes_array(c_type, py_array):
ar = (c_type * len(py_array))()
ar[:] = py_array
return ar
def win_OpenSCManager():
res = advapi32.OpenSCManagerW(None, None, SC_MANAGER_ALL_ACCESS)
if not res:
raise Exception('Opening service manager failed - '
'are you running this as administrator?')
return res
def win_install_service(service_name, cmdline):
manager = win_OpenSCManager()
try:
h = advapi32.CreateServiceW(
manager, service_name, None,
SC_MANAGER_CREATE_SERVICE, SERVICE_WIN32_OWN_PROCESS,
SERVICE_AUTO_START, SERVICE_ERROR_NORMAL,
cmdline, None, None, None, None, None)
if not h:
raise OSError('Service creation failed: %s' % ctypes.FormatError())
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_uninstall_service(service_name):
manager = win_OpenSCManager()
try:
h = advapi32.OpenServiceW(manager, service_name, DELETE)
if not h:
raise OSError('Could not find service %s: %s' % (
service_name, ctypes.FormatError()))
try:
if not advapi32.DeleteService(h):
raise OSError('Deletion failed: %s' % ctypes.FormatError())
finally:
advapi32.CloseServiceHandle(h)
finally:
advapi32.CloseServiceHandle(manager)
def win_service_report_event(service_name, msg, is_error=True):
with open('C:/sshkeys/log', 'a', encoding='utf-8') as f:
f.write(msg + '\n')
event_log = advapi32.RegisterEventSourceW(None, service_name)
if not event_log:
raise OSError('Could not report event: %s' % ctypes.FormatError())
try:
type_id = 0x0001 if is_error else 0x0004
event_id = 0xc0000000 if is_error else 0x40000000
lines = _ctypes_array(LPTSTR, [msg])
if not advapi32.ReportEventW(
event_log, type_id, 0, event_id, None, len(lines), 0,
lines, None):
raise OSError('Event reporting failed: %s' % ctypes.FormatError())
finally:
advapi32.DeregisterEventSource(event_log)
def win_service_handler(stop_event, *args):
try:
raise ValueError('Handler called with args ' + repr(args))
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_set_status(handle, status_code):
svcStatus = SERVICE_STATUS()
svcStatus.dwServiceType = SERVICE_WIN32_OWN_PROCESS
svcStatus.dwCurrentState = status_code
svcStatus.dwControlsAccepted = SERVICE_ACCEPT_STOP
svcStatus.dwServiceSpecificExitCode = 0
if not advapi32.SetServiceStatus(handle, ctypes.byref(svcStatus)):
raise OSError('SetServiceStatus failed: %r' % ctypes.FormatError())
def win_service_main(service_name, real_main, argc, argv_raw):
try:
#args = [argv_raw[i].value for i in range(argc)]
stop_event = threading.Event()
handler = HandlerEx(functools.partial(stop_event, win_service_handler))
h = advapi32.RegisterServiceCtrlHandlerExW(service_name, handler, None)
if not h:
raise OSError('Handler registration failed: %s' %
ctypes.FormatError())
TODO
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def win_service_start(service_name, real_main):
try:
cb = START_CALLBACK(
functools.partial(win_service_main, service_name, real_main))
dispatch_table = _ctypes_array(SERVICE_TABLE_ENTRY, [
SERVICE_TABLE_ENTRY(
service_name,
cb
),
SERVICE_TABLE_ENTRY(None, ctypes.cast(None, START_CALLBACK))
])
if not advapi32.StartServiceCtrlDispatcherW(dispatch_table):
raise OSError('ctypes start failed: %s' % ctypes.FormatError())
except Exception as e:
tb = traceback.format_exc()
msg = str(e) + '\n' + tb
win_service_report_event(service_name, msg, is_error=True)
raise
def main(args=None):
parser = argparse.ArgumentParser()
parser.add_argument('-i', '--install',
action='store_const', dest='action', const='install',
help='Launch at Windows startup')
parser.add_argument('-u', '--uninstall',
action='store_const', dest='action', const='uninstall',
help='Remove Windows service')
parser.add_argument('-s', '--service',
action='store_const', dest='action', const='service',
help='Run as a Windows service')
parser.add_argument('-b', '--bind', metavar='<host:port>',
action='store', default='localhost:8142',
help='Bind to host:port (default %default)')
options = parser.parse_args(args=args)
if options.action == 'install':
fn = os.path.abspath(__file__).replace('v:', '\\\\vboxsrv\\vbox')
cmdline = '%s %s -s -b %s' % (sys.executable, fn, options.bind)
win_install_service(SVCNAME, cmdline)
return
if options.action == 'uninstall':
win_uninstall_service(SVCNAME)
return
if options.action == 'service':
win_service_start(SVCNAME, main)
return
host, port_str = options.bind.split(':')
port = int(port_str)
print('Listening on %s:%d' % (host, port))
srv = BuildHTTPServer((host, port), BuildHTTPRequestHandler)
thr = threading.Thread(target=srv.serve_forever)
thr.start()
input('Press ENTER to shut down')
srv.shutdown()
thr.join()
def rmtree(path):
for name in os.listdir(path):
fname = os.path.join(path, name)
if os.path.isdir(fname):
rmtree(fname)
else:
os.chmod(fname, 0o666)
os.remove(fname)
os.rmdir(path)
#==============================================================================
class BuildError(Exception):
def __init__(self, output, code=500):
self.output = output
self.code = code
def __str__(self):
return self.output
class HTTPError(BuildError):
pass
class PythonBuilder(object):
def __init__(self, **kwargs):
pythonVersion = kwargs.pop('python', '2.7')
try:
key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, r'SOFTWARE\Python\PythonCore\%s\InstallPath' % pythonVersion)
try:
self.pythonPath, _ = _winreg.QueryValueEx(key, '')
finally:
_winreg.CloseKey(key)
except Exception:
raise BuildError('No such Python version: %s' % pythonVersion)
super(PythonBuilder, self).__init__(**kwargs)
class GITInfoBuilder(object):
def __init__(self, **kwargs):
try:
self.user, self.repoName = kwargs['path'][:2]
self.rev = kwargs.pop('rev')
except ValueError:
raise BuildError('Invalid path')
except KeyError as e:
raise BuildError('Missing mandatory parameter "%s"' % e.args[0])
path = os.path.join(os.environ['APPDATA'], 'Build archive', self.repoName, self.user)
if not os.path.exists(path):
os.makedirs(path)
self.basePath = tempfile.mkdtemp(dir=path)
self.buildPath = os.path.join(self.basePath, 'build')
super(GITInfoBuilder, self).__init__(**kwargs)
class GITBuilder(GITInfoBuilder):
def build(self):
try:
subprocess.check_output(['git', 'clone', 'git://github.com/%s/%s.git' % (self.user, self.repoName), self.buildPath])
subprocess.check_output(['git', 'checkout', self.rev], cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(GITBuilder, self).build()
class YoutubeDLBuilder(object):
authorizedUsers = ['fraca7', 'phihag', 'rg3', 'FiloSottile']
def __init__(self, **kwargs):
if self.repoName != 'youtube-dl':
raise BuildError('Invalid repository "%s"' % self.repoName)
if self.user not in self.authorizedUsers:
raise HTTPError('Unauthorized user "%s"' % self.user, 401)
super(YoutubeDLBuilder, self).__init__(**kwargs)
def build(self):
try:
subprocess.check_output([os.path.join(self.pythonPath, 'python.exe'), 'setup.py', 'py2exe'],
cwd=self.buildPath)
except subprocess.CalledProcessError as e:
raise BuildError(e.output)
super(YoutubeDLBuilder, self).build()
class DownloadBuilder(object):
def __init__(self, **kwargs):
self.handler = kwargs.pop('handler')
self.srcPath = os.path.join(self.buildPath, *tuple(kwargs['path'][2:]))
self.srcPath = os.path.abspath(os.path.normpath(self.srcPath))
if not self.srcPath.startswith(self.buildPath):
raise HTTPError(self.srcPath, 401)
super(DownloadBuilder, self).__init__(**kwargs)
def build(self):
if not os.path.exists(self.srcPath):
raise HTTPError('No such file', 404)
if os.path.isdir(self.srcPath):
raise HTTPError('Is a directory: %s' % self.srcPath, 401)
self.handler.send_response(200)
self.handler.send_header('Content-Type', 'application/octet-stream')
self.handler.send_header('Content-Disposition', 'attachment; filename=%s' % os.path.split(self.srcPath)[-1])
self.handler.send_header('Content-Length', str(os.stat(self.srcPath).st_size))
self.handler.end_headers()
with open(self.srcPath, 'rb') as src:
shutil.copyfileobj(src, self.handler.wfile)
super(DownloadBuilder, self).build()
class CleanupTempDir(object):
def build(self):
try:
rmtree(self.basePath)
except Exception as e:
print('WARNING deleting "%s": %s' % (self.basePath, e))
super(CleanupTempDir, self).build()
class Null(object):
def __init__(self, **kwargs):
pass
def start(self):
pass
def close(self):
pass
def build(self):
pass
class Builder(PythonBuilder, GITBuilder, YoutubeDLBuilder, DownloadBuilder, CleanupTempDir, Null):
pass
class BuildHTTPRequestHandler(BaseHTTPRequestHandler):
actionDict = { 'build': Builder, 'download': Builder } # They're the same, no more caching.
def do_GET(self):
path = urlparse.urlparse(self.path)
paramDict = dict([(key, value[0]) for key, value in urlparse.parse_qs(path.query).items()])
action, _, path = path.path.strip('/').partition('/')
if path:
path = path.split('/')
if action in self.actionDict:
try:
builder = self.actionDict[action](path=path, handler=self, **paramDict)
builder.start()
try:
builder.build()
finally:
builder.close()
except BuildError as e:
self.send_response(e.code)
msg = unicode(e).encode('UTF-8')
self.send_header('Content-Type', 'text/plain; charset=UTF-8')
self.send_header('Content-Length', len(msg))
self.end_headers()
self.wfile.write(msg)
except HTTPError as e:
self.send_response(e.code, str(e))
else:
self.send_response(500, 'Unknown build method "%s"' % action)
else:
self.send_response(500, 'Malformed URL')
#==============================================================================
if __name__ == '__main__':
main()

View File

@@ -3,31 +3,40 @@
import json
import sys
import hashlib
import urllib.request
import os.path
if len(sys.argv) <= 1:
print('Specify the version number as parameter')
sys.exit()
print('Specify the version number as parameter')
sys.exit()
version = sys.argv[1]
with open('update/LATEST_VERSION', 'w') as f:
f.write(version)
f.write(version)
versions_info = json.load(open('update/versions.json'))
if 'signature' in versions_info:
del versions_info['signature']
del versions_info['signature']
new_version = {}
filenames = {'bin': 'youtube-dl', 'exe': 'youtube-dl.exe', 'tar': 'youtube-dl-%s.tar.gz' % version}
filenames = {
'bin': 'youtube-dl',
'exe': 'youtube-dl.exe',
'tar': 'youtube-dl-%s.tar.gz' % version}
build_dir = os.path.join('..', '..', 'build', version)
for key, filename in filenames.items():
print('Downloading and checksumming %s...' %filename)
url = 'http://youtube-dl.org/downloads/%s/%s' % (version, filename)
data = urllib.request.urlopen(url).read()
sha256sum = hashlib.sha256(data).hexdigest()
new_version[key] = (url, sha256sum)
url = 'https://yt-dl.org/downloads/%s/%s' % (version, filename)
fn = os.path.join(build_dir, filename)
with open(fn, 'rb') as f:
data = f.read()
if not data:
raise ValueError('File %s is empty!' % fn)
sha256sum = hashlib.sha256(data).hexdigest()
new_version[key] = (url, sha256sum)
versions_info['versions'][version] = new_version
versions_info['latest'] = version
json.dump(versions_info, open('update/versions.json', 'w'), indent=4, sort_keys=True)
with open('update/versions.json', 'w') as jsonf:
json.dump(versions_info, jsonf, indent=4, sort_keys=True)

View File

@@ -22,7 +22,7 @@ entry_template=textwrap.dedent("""
<atom:link href="http://rg3.github.io/youtube-dl" />
<atom:content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Downloads available at <a href="http://youtube-dl.org/downloads/@VERSION@/">http://youtube-dl.org/downloads/@VERSION@/</a>
Downloads available at <a href="https://yt-dl.org/downloads/@VERSION@/">https://yt-dl.org/downloads/@VERSION@/</a>
</div>
</atom:content>
<atom:author>
@@ -54,4 +54,3 @@ atom_template = atom_template.replace('@ENTRIES@', entries_str)
with open('update/releases.atom','w',encoding='utf-8') as atom_file:
atom_file.write(atom_template)

View File

@@ -0,0 +1,34 @@
#!/usr/bin/env python3
import sys
import os
import textwrap
# We must be able to import youtube_dl
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
import youtube_dl
def main():
with open('supportedsites.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
ie_htmls = []
for ie in sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower()):
ie_html = '<b>{}</b>'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
continue
elif ie_desc is not None:
ie_html += ': {}'.format(ie.IE_DESC)
if ie.working() == False:
ie_html += ' (Currently broken)'
ie_htmls.append('<li>{}</li>'.format(ie_html))
template = template.replace('@SITES@', textwrap.indent('\n'.join(ie_htmls), '\t'))
with open('supportedsites.html', 'w', encoding='utf-8') as sitesf:
sitesf.write(template)
if __name__ == '__main__':
main()

View File

@@ -55,8 +55,8 @@ git push origin "$version"
/bin/echo -e "\n### OK, now it is time to build the binaries..."
REV=$(git rev-parse HEAD)
make youtube-dl youtube-dl.tar.gz
wget "http://jeromelaheurte.net:8142/download/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe || \
wget "http://jeromelaheurte.net:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
read -p "VM running? (y/n) " -n 1
wget "http://localhost:8142/build/rg3/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
mkdir -p "build/$version"
mv youtube-dl youtube-dl.exe "build/$version"
mv youtube-dl.tar.gz "build/$version/youtube-dl-$version.tar.gz"
@@ -67,7 +67,7 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
git checkout HEAD -- youtube-dl youtube-dl.exe
/bin/echo -e "\n### Signing and uploading the new binaries to youtube-dl.org..."
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..."
for f in $RELEASE_FILES; do gpg --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/"
@@ -85,6 +85,7 @@ ROOT=$(pwd)
"$ROOT/devscripts/gh-pages/sign-versions.py" < "$ROOT/updates_key.pem"
"$ROOT/devscripts/gh-pages/generate-download.py"
"$ROOT/devscripts/gh-pages/update-copyright.py"
"$ROOT/devscripts/gh-pages/update-sites.py"
git add *.html *.html.in update
git commit -m "release $version"
git show HEAD

View File

@@ -1,92 +0,0 @@
#!/usr/bin/env python
# Generate youtube signature algorithm from test cases
import sys
tests = [
# 92 - vflQw-fB4 2013/07/17
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`~\"",
"mrtyuioplkjhgfdsazxcvbnq1234567890QWERTY}IOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]\"|:;"),
# 90
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`",
"mrtyuioplkjhgfdsazxcvbne1234567890QWER[YUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={`]}|"),
# 88
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<",
"J:|}][{=+-_)(*&;%$#@>MNBVCXZASDFGH^KLPOIUYTREWQ0987654321mnbvcxzasdfghrklpoiuytej"),
# 87
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$^&*()_-+={[]}|:;?/>.<",
"!?;:|}][{=+-_)(*&^$#@/MNBVCXZASqFGHJKLPOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuytr"),
# 86 - vfl_ymO4Z 2013/06/27
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<",
"ertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!/#$%^&*()_-+={[|};?@"),
# 85 - vflSAFCP9 2013/07/19
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?/>.<",
"ertyuiqplkjhgfdsazx$vbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#<%^&*()_-+={[};?/c"),
# 84
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?>.<",
"<.>?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWe098765432rmnbvcxzasdfghjklpoiuyt1"),
# 83 - vflcaqGO8 2013/07/11
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!#$%^&*()_+={[};?/>.<",
"urty8ioplkjhgfdsazxcvbqm1234567S90QWERTYUIOPLKJHGFDnAZXCVBNM!#$%^&*()_+={[};?/>.<"),
# 82
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.<",
"Q>/?;}[{=+-(*<^%$#@!MNBVCXZASDFGHKLPOIUY8REWT0q&7654321mnbvcxzasdfghjklpoiuytrew9"),
# 81
("qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.",
"urty8ioplkjhgfdsazxcvbqm1234567e90QWERTYUIOPLKHGFDSnZXCVBNM!@#$%^&*(-+={[};?/>."),
]
def find_matching(wrong, right):
idxs = [wrong.index(c) for c in right]
return compress(idxs)
return ('s[%d]' % i for i in idxs)
def compress(idxs):
def _genslice(start, end, step):
starts = '' if start == 0 else str(start)
ends = ':%d' % (end+step)
steps = '' if step == 1 else (':%d' % step)
return 's[%s%s%s]' % (starts, ends, steps)
step = None
for i, prev in zip(idxs[1:], idxs[:-1]):
if step is not None:
if i - prev == step:
continue
yield _genslice(start, prev, step)
step = None
continue
if i - prev in [-1, 1]:
step = i - prev
start = prev
continue
else:
yield 's[%d]' % prev
if step is None:
yield 's[%d]' % i
else:
yield _genslice(start, i, step)
def _assert_compress(inp, exp):
res = list(compress(inp))
if res != exp:
print('Got %r, expected %r' % (res, exp))
assert res == exp
_assert_compress([0,2,4,6], ['s[0]', 's[2]', 's[4]', 's[6]'])
_assert_compress([0,1,2,4,6,7], ['s[:3]', 's[4]', 's[6:8]'])
_assert_compress([8,0,1,2,4,7,6,9], ['s[8]', 's[:3]', 's[4]', 's[7:5:-1]', 's[9]'])
def gen(wrong, right, indent):
code = ' + '.join(find_matching(wrong, right))
return 'if len(s) == %d:\n%s return %s\n' % (len(wrong), indent, code)
def genall(tests):
indent = ' ' * 8
return indent + (indent + 'el').join(gen(wrong, right, indent) for wrong,right in tests)
def main():
print(genall(tests))
if __name__ == '__main__':
main()

0
test/__init__.py Normal file
View File

View File

@@ -1,38 +1,63 @@
import errno
import io
import hashlib
import json
import os.path
import re
import types
import youtube_dl.extractor
from youtube_dl import YoutubeDL, YoutubeDLHandler
from youtube_dl.utils import (
compat_cookiejar,
compat_urllib_request,
)
from youtube_dl import YoutubeDL
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
def global_setup():
youtube_dl._setup_opener(timeout=10)
def get_params(override=None):
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
if override:
parameters.update(override)
return parameters
def try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
class FakeYDL(YoutubeDL):
def __init__(self):
self.result = []
# Different instances of the downloader can't share the same dictionary
# some test set the "sublang" parameter, which would break the md5 checks.
self.params = dict(parameters)
def to_screen(self, s):
params = get_params()
super(FakeYDL, self).__init__(params)
self.result = []
def to_screen(self, s, skip_eol=None):
print(s)
def trouble(self, s, tb=None):
raise Exception(s)
def download(self, x):
self.result.append(x)
def expect_warning(self, regex):
# Silence an expected warning matching a regex
old_report_warning = self.report_warning
def report_warning(self, message):
if re.match(regex, message): return
old_report_warning(message)
self.report_warning = types.MethodType(report_warning, self)
def get_testcases():
for ie in youtube_dl.extractor.gen_extractors():
t = getattr(ie, '_TEST', None)
@@ -42,3 +67,6 @@ def get_testcases():
for t in getattr(ie, '_TESTS', []):
t['name'] = type(ie).__name__[:-len('IE')]
yield t
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()

View File

@@ -38,7 +38,6 @@
"writedescription": false,
"writeinfojson": true,
"writesubtitles": false,
"onlysubtitles": false,
"allsubtitles": false,
"listssubtitles": false
}

View File

@@ -0,0 +1,55 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import global_setup, try_rm
global_setup()
from youtube_dl import YoutubeDL
def _download_restricted(url, filename, age):
""" Returns true iff the file has been downloaded """
params = {
'age_limit': age,
'skip_download': True,
'writeinfojson': True,
"outtmpl": "%(id)s.%(ext)s",
}
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
json_filename = filename + '.info.json'
try_rm(json_filename)
ydl.download([url])
res = os.path.exists(json_filename)
try_rm(json_filename)
return res
class TestAgeRestriction(unittest.TestCase):
def _assert_restricted(self, url, filename, age, old_age=None):
self.assertTrue(_download_restricted(url, filename, old_age))
self.assertFalse(_download_restricted(url, filename, age))
def test_youtube(self):
self._assert_restricted('07FYdnEawAQ', '07FYdnEawAQ.mp4', 10)
def test_youporn(self):
self._assert_restricted(
'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'505835.mp4', 2, old_age=25)
def test_pornotube(self):
self._assert_restricted(
'http://pornotube.com/c/173/m/1689755/Marilyn-Monroe-Bathing',
'1689755.flv', 13)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,34 +1,66 @@
#!/usr/bin/env python
import sys
import unittest
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_testcases
from youtube_dl.extractor import (
gen_extractors,
JustinTVIE,
YoutubeIE,
)
from youtube_dl.extractor import YoutubeIE, YoutubePlaylistIE, YoutubeChannelIE, JustinTVIE, gen_extractors
from helper import get_testcases
class TestAllURLsMatching(unittest.TestCase):
def setUp(self):
self.ies = gen_extractors()
def matching_ies(self, url):
return [ie.IE_NAME for ie in self.ies if ie.suitable(url) and ie.IE_NAME != 'generic']
def assertMatch(self, url, ie_list):
self.assertEqual(self.matching_ies(url), ie_list)
def test_youtube_playlist_matching(self):
self.assertTrue(YoutubePlaylistIE.suitable(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8'))
self.assertTrue(YoutubePlaylistIE.suitable(u'UUBABnxM4Ar9ten8Mdjj1j0Q')) #585
self.assertTrue(YoutubePlaylistIE.suitable(u'PL63F0C78739B09958'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC'))
self.assertTrue(YoutubePlaylistIE.suitable(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668
self.assertFalse(YoutubePlaylistIE.suitable(u'PLtS2H6bU1M'))
assertPlaylist = lambda url: self.assertMatch(url, ['youtube:playlist'])
assertPlaylist(u'ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'UUBABnxM4Ar9ten8Mdjj1j0Q') #585
assertPlaylist(u'PL63F0C78739B09958')
assertPlaylist(u'https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist(u'https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist(u'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')
assertPlaylist(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012') #668
self.assertFalse('youtube:playlist' in self.matching_ies(u'PLtS2H6bU1M'))
def test_youtube_matching(self):
self.assertTrue(YoutubeIE.suitable(u'PLtS2H6bU1M'))
self.assertFalse(YoutubeIE.suitable(u'https://www.youtube.com/watch?v=AV6J6_AeFEQ&playnext=1&list=PL4023E734DA416012')) #668
self.assertMatch('http://youtu.be/BaW_jenozKc', ['youtube'])
self.assertMatch('http://www.youtube.com/v/BaW_jenozKc', ['youtube'])
self.assertMatch('https://youtube.googleapis.com/v/BaW_jenozKc', ['youtube'])
def test_youtube_channel_matching(self):
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM'))
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec'))
self.assertTrue(YoutubeChannelIE.suitable('https://www.youtube.com/channel/HCtnHdj3df7iM/videos'))
assertChannel = lambda url: self.assertMatch(url, ['youtube:channel'])
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM?feature=gb_ch_rec')
assertChannel('https://www.youtube.com/channel/HCtnHdj3df7iM/videos')
def test_youtube_user_matching(self):
self.assertMatch('www.youtube.com/NASAgovVideo/videos', ['youtube:user'])
def test_youtube_feeds(self):
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:watch_later'])
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:subscriptions'])
self.assertMatch('https://www.youtube.com/feed/recommended', ['youtube:recommended'])
self.assertMatch('https://www.youtube.com/my_favorites', ['youtube:favorites'])
def test_youtube_show_matching(self):
self.assertMatch('http://www.youtube.com/show/airdisasters', ['youtube:show'])
def test_justin_tv_channelid_matching(self):
self.assertTrue(JustinTVIE.suitable(u"justin.tv/vanillatv"))
@@ -47,9 +79,13 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertTrue(JustinTVIE.suitable(u"http://www.twitch.tv/tsm_theoddone/c/2349361"))
def test_youtube_extract(self):
self.assertEqual(YoutubeIE()._extract_id('http://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?&v=BaW_jenozKc'), 'BaW_jenozKc')
self.assertEqual(YoutubeIE()._extract_id('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc'), 'BaW_jenozKc')
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE()._extract_id(url), id)
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
def test_no_duplicates(self):
ies = gen_extractors()
@@ -62,15 +98,12 @@ class TestAllURLsMatching(unittest.TestCase):
self.assertFalse(ie.suitable(url), '%s should not match URL %r' % (type(ie).__name__, url))
def test_keywords(self):
ies = gen_extractors()
matching_ies = lambda url: [ie.IE_NAME for ie in ies
if ie.suitable(url) and ie.IE_NAME != 'generic']
self.assertEqual(matching_ies(':ytsubs'), ['youtube:subscriptions'])
self.assertEqual(matching_ies(':ytsubscriptions'), ['youtube:subscriptions'])
self.assertEqual(matching_ies(':thedailyshow'), ['ComedyCentral'])
self.assertEqual(matching_ies(':tds'), ['ComedyCentral'])
self.assertEqual(matching_ies(':colbertreport'), ['ComedyCentral'])
self.assertEqual(matching_ies(':cr'), ['ComedyCentral'])
self.assertMatch(':ytsubs', ['youtube:subscriptions'])
self.assertMatch(':ytsubscriptions', ['youtube:subscriptions'])
self.assertMatch(':thedailyshow', ['ComedyCentral'])
self.assertMatch(':tds', ['ComedyCentral'])
self.assertMatch(':colbertreport', ['ComedyCentral'])
self.assertMatch(':cr', ['ComedyCentral'])
if __name__ == '__main__':

View File

@@ -0,0 +1,70 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, global_setup, md5
global_setup()
from youtube_dl.extractor import DailymotionIE
class TestDailymotionSubtitles(unittest.TestCase):
def setUp(self):
self.DL = FakeYDL()
self.url = 'http://www.dailymotion.com/video/xczg00'
def getInfoDict(self):
IE = DailymotionIE(self.DL)
info_dict = IE.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_no_writesubtitles(self):
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_subtitles(self):
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '976553874490cba125086bbfea3ff76f')
def test_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['fr']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['fr']), '594564ec7d588942e384e920e5341792')
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 5)
def test_list_subtitles(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_automatic_captions(self):
self.DL.expect_warning(u'Automatic Captions not supported by this server')
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslang'] = ['en']
subtitles = self.getSubtitles()
self.assertTrue(len(subtitles.keys()) == 0)
def test_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'http://www.dailymotion.com/video/x12u166_le-zapping-tele-star-du-08-aout-2013_tv'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_multiple_langs(self):
self.DL.params['writesubtitles'] = True
langs = ['es', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,43 +1,31 @@
#!/usr/bin/env python
import errno
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, get_testcases, global_setup, try_rm, md5
global_setup()
import hashlib
import io
import os
import json
import unittest
import sys
import socket
import binascii
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import youtube_dl.YoutubeDL
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
from youtube_dl.utils import (
compat_str,
compat_urllib_error,
DownloadError,
ExtractorError,
UnavailableVideoError,
)
RETRIES = 3
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(10)
def _try_rm(filename):
""" Remove a file if it exists """
try:
os.remove(filename)
except OSError as ose:
if ose.errno != errno.ENOENT:
raise
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
self.to_stderr = self.to_screen
@@ -54,17 +42,12 @@ def _file_md5(fn):
with open(fn, 'rb') as f:
return hashlib.md5(f.read()).hexdigest()
from helper import get_testcases
defs = get_testcases()
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
class TestDownload(unittest.TestCase):
maxDiff = None
def setUp(self):
self.parameters = parameters
self.defs = defs
### Dynamically generate tests
@@ -84,8 +67,7 @@ def generator(test_case):
print_skipping(test_case['skip'])
return
params = self.parameters.copy()
params.update(test_case.get('params', {}))
params = get_params(test_case.get('params', {}))
ydl = YoutubeDL(params)
ydl.add_default_info_extractors()
@@ -97,9 +79,9 @@ def generator(test_case):
test_cases = test_case.get('playlist', [test_case])
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
try_rm(tc['file'])
try_rm(tc['file'] + '.part')
try_rm(tc['file'] + '.info.json')
try:
for retry in range(1, RETRIES + 1):
try:
@@ -127,12 +109,11 @@ def generator(test_case):
info_dict = json.load(infof)
for (info_field, expected) in tc.get('info_dict', {}).items():
if isinstance(expected, compat_str) and expected.startswith('md5:'):
self.assertEqual(expected, 'md5:' + md5(info_dict.get(info_field)))
got = 'md5:' + md5(info_dict.get(info_field))
else:
got = info_dict.get(info_field)
self.assertEqual(
expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
self.assertEqual(expected, got,
u'invalid value for field %s, expected %r, got %r' % (info_field, expected, got))
# If checkable fields are missing from the test case, print the info_dict
test_info_dict = dict((key, value if not isinstance(value, compat_str) or len(value) < 250 else 'md5:' + md5(value))
@@ -146,9 +127,9 @@ def generator(test_case):
self.assertTrue(key in info_dict.keys() and info_dict[key])
finally:
for tc in test_cases:
_try_rm(tc['file'])
_try_rm(tc['file'] + '.part')
_try_rm(tc['file'] + '.info.json')
try_rm(tc['file'])
try_rm(tc['file'] + '.part')
try_rm(tc['file'] + '.info.json')
return test_template

89
test/test_playlists.py Normal file
View File

@@ -0,0 +1,89 @@
#!/usr/bin/env python
# encoding: utf-8
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, global_setup
global_setup()
from youtube_dl.extractor import (
DailymotionPlaylistIE,
DailymotionUserIE,
VimeoChannelIE,
UstreamChannelIE,
SoundcloudUserIE,
LivestreamIE,
NHLVideocenterIE,
)
class TestPlaylists(unittest.TestCase):
def assertIsPlaylist(self, info):
"""Make sure the info has '_type' set to 'playlist'"""
self.assertEqual(info['_type'], 'playlist')
def test_dailymotion_playlist(self):
dl = FakeYDL()
ie = DailymotionPlaylistIE(dl)
result = ie.extract('http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'SPORT')
self.assertTrue(len(result['entries']) > 20)
def test_dailymotion_user(self):
dl = FakeYDL()
ie = DailymotionUserIE(dl)
result = ie.extract('http://www.dailymotion.com/user/generation-quoi/')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Génération Quoi')
self.assertTrue(len(result['entries']) >= 26)
def test_vimeo_channel(self):
dl = FakeYDL()
ie = VimeoChannelIE(dl)
result = ie.extract('http://vimeo.com/channels/tributes')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'Vimeo Tributes')
self.assertTrue(len(result['entries']) > 24)
def test_ustream_channel(self):
dl = FakeYDL()
ie = UstreamChannelIE(dl)
result = ie.extract('http://www.ustream.tv/channel/young-americans-for-liberty')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'5124905')
self.assertTrue(len(result['entries']) >= 11)
def test_soundcloud_user(self):
dl = FakeYDL()
ie = SoundcloudUserIE(dl)
result = ie.extract('https://soundcloud.com/the-concept-band')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'9615865')
self.assertTrue(len(result['entries']) >= 12)
def test_livestream_event(self):
dl = FakeYDL()
ie = LivestreamIE(dl)
result = ie.extract('http://new.livestream.com/tedx/cityenglish')
self.assertIsPlaylist(result)
self.assertEqual(result['title'], u'TEDCity2.0 (English)')
self.assertTrue(len(result['entries']) >= 4)
def test_nhl_videocenter(self):
dl = FakeYDL()
ie = NHLVideocenterIE(dl)
result = ie.extract('http://video.canucks.nhl.com/videocenter/console?catid=999')
self.assertIsPlaylist(result)
self.assertEqual(result['id'], u'999')
self.assertEqual(result['title'], u'Highlights')
self.assertEqual(len(result['entries']), 12)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,23 +1,30 @@
#!/usr/bin/env python
# Various small unit tests
import sys
import unittest
import xml.etree.ElementTree
# coding: utf-8
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Various small unit tests
import xml.etree.ElementTree
#from youtube_dl.utils import htmlentity_transform
from youtube_dl.utils import timeconvert
from youtube_dl.utils import sanitize_filename
from youtube_dl.utils import unescapeHTML
from youtube_dl.utils import orderedSet
from youtube_dl.utils import DateRange
from youtube_dl.utils import unified_strdate
from youtube_dl.utils import find_xpath_attr
from youtube_dl.utils import (
timeconvert,
sanitize_filename,
unescapeHTML,
orderedSet,
DateRange,
unified_strdate,
find_xpath_attr,
get_meta_content,
xpath_with_ns,
smuggle_url,
unsmuggle_url,
)
if sys.version_info < (3, 0):
_compat_str = lambda b: b.decode('unicode-escape')
@@ -127,5 +134,42 @@ class TestUtil(unittest.TestCase):
self.assertEqual(find_xpath_attr(doc, './/node', 'x', 'a'), doc[1])
self.assertEqual(find_xpath_attr(doc, './/node', 'y', 'c'), doc[2])
def test_meta_parser(self):
testhtml = u'''
<head>
<meta name="description" content="foo &amp; bar">
<meta content='Plato' name='author'/>
</head>
'''
get_meta = lambda name: get_meta_content(name, testhtml)
self.assertEqual(get_meta('description'), u'foo & bar')
self.assertEqual(get_meta('author'), 'Plato')
def test_xpath_with_ns(self):
testxml = u'''<root xmlns:media="http://example.com/">
<media:song>
<media:author>The Author</media:author>
<url>http://server.com/download.mp3</url>
</media:song>
</root>'''
doc = xml.etree.ElementTree.fromstring(testxml)
find = lambda p: doc.find(xpath_with_ns(p, {'media': 'http://example.com/'}))
self.assertTrue(find('media:song') is not None)
self.assertEqual(find('media:song/media:author').text, u'The Author')
self.assertEqual(find('media:song/url').text, u'http://server.com/download.mp3')
def test_smuggle_url(self):
data = {u"ö": u"ö", u"abc": [3]}
url = 'https://foo.bar/baz?x=y#a'
smug_url = smuggle_url(url, data)
unsmug_url, unsmug_data = unsmuggle_url(smug_url)
self.assertEqual(url, unsmug_url)
self.assertEqual(data, unsmug_data)
res_url, res_data = unsmuggle_url(url)
self.assertEqual(res_url, url)
self.assertEqual(res_data, None)
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,80 @@
#!/usr/bin/env python
# coding: utf-8
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, global_setup, try_rm
global_setup()
import io
import xml.etree.ElementTree
import youtube_dl.YoutubeDL
import youtube_dl.extractor
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
params = get_params({
'writeannotations': True,
'skip_download': True,
'writeinfojson': False,
'format': 'flv',
})
TEST_ID = 'gr51aVj-mLg'
ANNOTATIONS_FILE = TEST_ID + '.flv.annotations.xml'
EXPECTED_ANNOTATIONS = ['Speech bubble', 'Note', 'Title', 'Spotlight', 'Label']
class TestAnnotations(unittest.TestCase):
def setUp(self):
# Clear old files
self.tearDown()
def test_info_json(self):
expected = list(EXPECTED_ANNOTATIONS) #Two annotations could have the same text.
ie = youtube_dl.extractor.YoutubeIE()
ydl = YoutubeDL(params)
ydl.add_info_extractor(ie)
ydl.download([TEST_ID])
self.assertTrue(os.path.exists(ANNOTATIONS_FILE))
annoxml = None
with io.open(ANNOTATIONS_FILE, 'r', encoding='utf-8') as annof:
annoxml = xml.etree.ElementTree.parse(annof)
self.assertTrue(annoxml is not None, 'Failed to parse annotations XML')
root = annoxml.getroot()
self.assertEqual(root.tag, 'document')
annotationsTag = root.find('annotations')
self.assertEqual(annotationsTag.tag, 'annotations')
annotations = annotationsTag.findall('annotation')
#Not all the annotations have TEXT children and the annotations are returned unsorted.
for a in annotations:
self.assertEqual(a.tag, 'annotation')
if a.get('type') == 'text':
textTag = a.find('TEXT')
text = textTag.text
self.assertTrue(text in expected) #assertIn only added in python 2.7
#remove the first occurance, there could be more than one annotation with the same text
expected.remove(text)
#We should have seen (and removed) all the expected annotation texts.
self.assertEqual(len(expected), 0, 'Not all expected annotations were found.')
def tearDown(self):
try_rm(ANNOTATIONS_FILE)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,37 +1,34 @@
#!/usr/bin/env python
# coding: utf-8
import json
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# Allow direct execution
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import get_params, global_setup
global_setup()
import io
import json
import youtube_dl.YoutubeDL
import youtube_dl.extractor
from youtube_dl.utils import *
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "parameters.json")
# General configuration (from __init__, not very elegant...)
jar = compat_cookiejar.CookieJar()
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
proxy_handler = compat_urllib_request.ProxyHandler()
opener = compat_urllib_request.build_opener(proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
class YoutubeDL(youtube_dl.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
params = json.load(pf)
params['writeinfojson'] = True
params['skip_download'] = True
params['writedescription'] = True
params = get_params({
'writeinfojson': True,
'skip_download': True,
'writedescription': True,
})
TEST_ID = 'BaW_jenozKc'
INFO_JSON_FILE = TEST_ID + '.mp4.info.json'
@@ -42,6 +39,7 @@ This is a test video for youtube-dl.
For more information, contact phihag@phihag.de .'''
class TestInfoJSON(unittest.TestCase):
def setUp(self):
# Clear old files

View File

@@ -1,20 +1,26 @@
#!/usr/bin/env python
import sys
import unittest
import json
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor import YoutubeUserIE, YoutubePlaylistIE, YoutubeIE, YoutubeChannelIE, YoutubeShowIE
from youtube_dl.utils import *
from test.helper import FakeYDL, global_setup
global_setup()
from youtube_dl.extractor import (
YoutubeUserIE,
YoutubePlaylistIE,
YoutubeIE,
YoutubeChannelIE,
YoutubeShowIE,
)
from helper import FakeYDL
class TestYoutubeLists(unittest.TestCase):
def assertIsPlaylist(self,info):
def assertIsPlaylist(self, info):
"""Make sure the info has '_type' set to 'playlist'"""
self.assertEqual(info['_type'], 'playlist')
@@ -27,6 +33,14 @@ class TestYoutubeLists(unittest.TestCase):
ytie_results = [YoutubeIE()._extract_id(url['url']) for url in result['entries']]
self.assertEqual(ytie_results, [ 'bV9L5Ht9LgY', 'FXxLjLQi3Fg', 'tU3Bgo5qJZE'])
def test_youtube_playlist_noplaylist(self):
dl = FakeYDL()
dl.params['noplaylist'] = True
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=FXxLjLQi3Fg&list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re')
self.assertEqual(result['_type'], 'url')
self.assertEqual(YoutubeIE()._extract_id(result['url']), 'FXxLjLQi3Fg')
def test_issue_673(self):
dl = FakeYDL()
ie = YoutubePlaylistIE(dl)
@@ -92,7 +106,7 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL()
ie = YoutubeShowIE(dl)
result = ie.extract('http://www.youtube.com/show/airdisasters')
self.assertTrue(len(result) >= 4)
self.assertTrue(len(result) >= 3)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,67 +0,0 @@
#!/usr/bin/env python
import unittest
import sys
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.extractor.youtube import YoutubeIE
from helper import FakeYDL
sig = YoutubeIE(FakeYDL())._decrypt_signature
class TestYoutubeSig(unittest.TestCase):
def test_92(self):
wrong = "F9F9B6E6FD47029957AB911A964CC20D95A181A5D37A2DBEFD67D403DB0E8BE4F4910053E4E8A79.0B70B.0B80B8"
right = "69B6E6FD47029957AB911A9F4CC20D95A181A5D3.A2DBEFD67D403DB0E8BE4F4910053E4E8A7980B7"
self.assertEqual(sig(wrong), right)
def test_90(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<'`"
right = "mrtyuioplkjhgfdsazxcvbne1234567890QWER[YUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={`]}|"
self.assertEqual(sig(wrong), right)
def test_88(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[]}|:;?/>.<"
right = "J:|}][{=+-_)(*&;%$#@>MNBVCXZASDFGH^KLPOIUYTREWQ0987654321mnbvcxzasdfghrklpoiuytej"
self.assertEqual(sig(wrong), right)
def test_87(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$^&*()_-+={[]}|:;?/>.<"
right = "!?;:|}][{=+-_)(*&^$#@/MNBVCXZASqFGHJKLPOIUYTREWQ0987654321mnbvcxzasdfghjklpoiuytr"
self.assertEqual(sig(wrong), right)
def test_86(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[|};?/>.<"
right = "ertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!/#$%^&*()_-+={[|};?@"
self.assertEqual(sig(wrong), right)
def test_85(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?/>.<"
right = "ertyuiqplkjhgfdsazx$vbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#<%^&*()_-+={[};?/c"
self.assertEqual(sig(wrong), right)
def test_84(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!@#$%^&*()_-+={[};?>.<"
right = "<.>?;}[{=+-_)(*&^%$#@!MNBVCXZASDFGHJKLPOIUYTREWe098765432rmnbvcxzasdfghjklpoiuyt1"
self.assertEqual(sig(wrong), right)
def test_83(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKJHGFDSAZXCVBNM!#$%^&*()_+={[};?/>.<"
right = "urty8ioplkjhgfdsazxcvbqm1234567S90QWERTYUIOPLKJHGFDnAZXCVBNM!#$%^&*()_+={[};?/>.<"
self.assertEqual(sig(wrong), right)
def test_82(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>.<"
right = "Q>/?;}[{=+-(*<^%$#@!MNBVCXZASDFGHKLPOIUY8REWT0q&7654321mnbvcxzasdfghjklpoiuytrew9"
self.assertEqual(sig(wrong), right)
def test_81(self):
wrong = "qwertyuioplkjhgfdsazxcvbnm1234567890QWERTYUIOPLKHGFDSAZXCVBNM!@#$%^&*(-+={[};?/>."
right = "urty8ioplkjhgfdsazxcvbqm1234567e90QWERTYUIOPLKHGFDSnZXCVBNM!@#$%^&*(-+={[};?/>."
self.assertEqual(sig(wrong), right)
if __name__ == '__main__':
unittest.main()

View File

@@ -0,0 +1,84 @@
#!/usr/bin/env python
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import global_setup
global_setup()
import io
import re
import string
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import compat_str, compat_urlretrieve
_TESTS = [
(
u'https://s.ytimg.com/yts/jsbin/html5player-vflHOr_nV.js',
u'js',
86,
u'>=<;:/.-[+*)(\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBA\\yxwvutsrqponmlkjihgfedcba987654321',
),
(
u'https://s.ytimg.com/yts/jsbin/html5player-vfldJ8xgI.js',
u'js',
85,
u'3456789a0cdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS[UVWXYZ!"#$%&\'()*+,-./:;<=>?@',
),
(
u'https://s.ytimg.com/yts/swfbin/watch_as3-vflg5GhxU.swf',
u'swf',
82,
u':/.-,+*)=\'&%$#"!ZYX0VUTSRQPONMLKJIHGFEDCBAzyxw>utsrqponmlkjihgfedcba987654321'
),
]
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
self.TESTDATA_DIR = os.path.join(TEST_DIR, 'testdata')
if not os.path.exists(self.TESTDATA_DIR):
os.mkdir(self.TESTDATA_DIR)
def make_tfunc(url, stype, sig_length, expected_sig):
basename = url.rpartition('/')[2]
m = re.match(r'.*-([a-zA-Z0-9_-]+)\.[a-z]+$', basename)
assert m, '%r should follow URL format' % basename
test_id = m.group(1)
def test_func(self):
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
compat_urlretrieve(url, fn)
ie = YoutubeIE()
if stype == 'js':
with io.open(fn, encoding='utf-8') as testf:
jscode = testf.read()
func = ie._parse_sig_js(jscode)
else:
assert stype == 'swf'
with open(fn, 'rb') as testf:
swfcode = testf.read()
func = ie._parse_sig_swf(swfcode)
src_sig = compat_str(string.printable[:sig_length])
got_sig = func(src_sig)
self.assertEqual(got_sig, expected_sig)
test_func.__name__ = str('test_signature_' + stype + '_' + test_id)
setattr(TestSignature, test_func.__name__, test_func)
for test_spec in _TESTS:
make_tfunc(*test_spec)
if __name__ == '__main__':
unittest.main()

View File

@@ -1,95 +1,95 @@
#!/usr/bin/env python
import sys
import unittest
import json
import io
import hashlib
# Allow direct execution
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, global_setup, md5
global_setup()
from youtube_dl.extractor import YoutubeIE
from youtube_dl.utils import *
from helper import FakeYDL
md5 = lambda s: hashlib.md5(s.encode('utf-8')).hexdigest()
class TestYoutubeSubtitles(unittest.TestCase):
def setUp(self):
DL = FakeYDL()
DL.params['allsubtitles'] = False
DL.params['writesubtitles'] = False
DL.params['subtitlesformat'] = 'srt'
DL.params['listsubtitles'] = False
def test_youtube_no_subtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = False
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
subtitles = info_dict[0]['subtitles']
self.DL = FakeYDL()
self.url = 'QRS8MkLhQmM'
def getInfoDict(self):
IE = YoutubeIE(self.DL)
info_dict = IE.extract(self.url)
return info_dict
def getSubtitles(self):
info_dict = self.getInfoDict()
return info_dict[0]['subtitles']
def test_youtube_no_writesubtitles(self):
self.DL.params['writesubtitles'] = False
subtitles = self.getSubtitles()
self.assertEqual(subtitles, None)
def test_youtube_subtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_it(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['subtitleslang'] = 'it'
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_onlysubtitles(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['onlysubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '4cd9278a35ba2305f47354ee13472260')
self.DL.params['writesubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '4cd9278a35ba2305f47354ee13472260')
def test_youtube_subtitles_lang(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['it']), '164a51f16f260476a05b50fe4c2f161d')
def test_youtube_allsubtitles(self):
DL = FakeYDL()
DL.params['allsubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
subtitles = info_dict[0]['subtitles']
self.assertEqual(len(subtitles), 13)
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
def test_youtube_subtitles_sbv_format(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['subtitlesformat'] = 'sbv'
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '13aeaa0c245a8bed9a451cb643e3ad8b')
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'sbv'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '13aeaa0c245a8bed9a451cb643e3ad8b')
def test_youtube_subtitles_vtt_format(self):
DL = FakeYDL()
DL.params['writesubtitles'] = True
DL.params['subtitlesformat'] = 'vtt'
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
sub = info_dict[0]['subtitles'][0]
self.assertEqual(md5(sub[2]), '356cdc577fde0c6783b9b822e7206ff7')
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['en']), '356cdc577fde0c6783b9b822e7206ff7')
def test_youtube_list_subtitles(self):
DL = FakeYDL()
DL.params['listsubtitles'] = True
IE = YoutubeIE(DL)
info_dict = IE.extract('QRS8MkLhQmM')
self.DL.expect_warning(u'Video doesn\'t have automatic captions')
self.DL.params['listsubtitles'] = True
info_dict = self.getInfoDict()
self.assertEqual(info_dict, None)
def test_youtube_automatic_captions(self):
DL = FakeYDL()
DL.params['writeautomaticsub'] = True
DL.params['subtitleslang'] = 'it'
IE = YoutubeIE(DL)
info_dict = IE.extract('8YoUxe5ncPo')
sub = info_dict[0]['subtitles'][0]
self.assertTrue(sub[2] is not None)
self.url = '8YoUxe5ncPo'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
def test_youtube_nosubtitles(self):
self.DL.expect_warning(u'video doesn\'t have subtitles')
self.url = 'sAjKT8FhjI8'
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles), 0)
def test_youtube_multiple_langs(self):
self.url = 'QRS8MkLhQmM'
self.DL.params['writesubtitles'] = True
langs = ['it', 'fr', 'de']
self.DL.params['subtitleslangs'] = langs
subtitles = self.getSubtitles()
for lang in langs:
self.assertTrue(subtitles.get(lang) is not None, u'Subtitles for \'%s\' not extracted' % lang)
if __name__ == '__main__':
unittest.main()

8
tox.ini Normal file
View File

@@ -0,0 +1,8 @@
[tox]
envlist = py26,py27,py33
[testenv]
deps =
nose
coverage
commands = nosetests --verbose {posargs:test} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@@ -63,32 +63,57 @@ class FileDownloader(object):
converted = float(bytes) / float(1024 ** exponent)
return '%.2f%s' % (converted, suffix)
@staticmethod
def format_seconds(seconds):
(mins, secs) = divmod(seconds, 60)
(hours, mins) = divmod(mins, 60)
if hours > 99:
return '--:--:--'
if hours == 0:
return '%02d:%02d' % (mins, secs)
else:
return '%02d:%02d:%02d' % (hours, mins, secs)
@staticmethod
def calc_percent(byte_counter, data_len):
if data_len is None:
return None
return float(byte_counter) / float(data_len) * 100.0
@staticmethod
def format_percent(percent):
if percent is None:
return '---.-%'
return '%6s' % ('%3.1f%%' % (float(byte_counter) / float(data_len) * 100.0))
return '%6s' % ('%3.1f%%' % percent)
@staticmethod
def calc_eta(start, now, total, current):
if total is None:
return '--:--'
return None
dif = now - start
if current == 0 or dif < 0.001: # One millisecond
return '--:--'
return None
rate = float(current) / dif
eta = int((float(total) - float(current)) / rate)
(eta_mins, eta_secs) = divmod(eta, 60)
if eta_mins > 99:
return int((float(total) - float(current)) / rate)
@staticmethod
def format_eta(eta):
if eta is None:
return '--:--'
return '%02d:%02d' % (eta_mins, eta_secs)
return FileDownloader.format_seconds(eta)
@staticmethod
def calc_speed(start, now, bytes):
dif = now - start
if bytes == 0 or dif < 0.001: # One millisecond
return None
return float(bytes) / dif
@staticmethod
def format_speed(speed):
if speed is None:
return '%10s' % '---b/s'
return '%10s' % ('%s/s' % FileDownloader.format_bytes(float(bytes) / dif))
return '%10s' % ('%s/s' % FileDownloader.format_bytes(speed))
@staticmethod
def best_block_size(elapsed_time, bytes):
@@ -197,11 +222,14 @@ class FileDownloader(object):
"""Report destination filename."""
self.to_screen(u'[download] Destination: ' + filename)
def report_progress(self, percent_str, data_len_str, speed_str, eta_str):
def report_progress(self, percent, data_len_str, speed, eta):
"""Report download progress."""
if self.params.get('noprogress', False):
return
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
eta_str = self.format_eta(eta)
percent_str = self.format_percent(percent)
speed_str = self.format_speed(speed)
if self.params.get('progress_with_newline', False):
self.to_screen(u'[download] %s of %s at %s ETA %s' %
(percent_str, data_len_str, speed_str, eta_str))
@@ -230,16 +258,19 @@ class FileDownloader(object):
"""Report it was impossible to resume download."""
self.to_screen(u'[download] Unable to resume')
def report_finish(self):
def report_finish(self, data_len_str, tot_time):
"""Report download finished."""
if self.params.get('noprogress', False):
self.to_screen(u'[download] Download completed')
else:
self.to_screen(u'')
clear_line = (u'\x1b[K' if sys.stderr.isatty() and os.name != 'nt' else u'')
self.to_screen(u'\r%s[download] 100%% of %s in %s' %
(clear_line, data_len_str, self.format_seconds(tot_time)))
def _download_with_rtmpdump(self, filename, url, player_url, page_url, play_path, tc_url):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
test = self.params.get('test', False)
# Check for rtmpdump first
try:
@@ -261,6 +292,8 @@ class FileDownloader(object):
basic_args += ['--playpath', play_path]
if tc_url is not None:
basic_args += ['--tcUrl', url]
if test:
basic_args += ['--stop', '1']
args = basic_args + [[], ['--resume', '--skip', '1']][self.params.get('continuedl', False)]
if self.params.get('verbose', False):
try:
@@ -270,7 +303,7 @@ class FileDownloader(object):
shell_quote = repr
self.to_screen(u'[debug] rtmpdump command line: ' + shell_quote(args))
retval = subprocess.call(args)
while retval == 2 or retval == 1:
while (retval == 2 or retval == 1) and not test:
prevsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % prevsize, skip_eol=True)
time.sleep(5.0) # This seems to be needed
@@ -283,7 +316,7 @@ class FileDownloader(object):
self.to_screen(u'\r[rtmpdump] Could not download the whole video. This can happen for some advertisements.')
retval = 0
break
if retval == 0:
if retval == 0 or (test and retval == 2):
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[rtmpdump] %s bytes' % fsize)
self.try_rename(tmpfilename, filename)
@@ -329,6 +362,35 @@ class FileDownloader(object):
self.report_error(u'mplayer exited with code %d' % retval)
return False
def _download_m3u8_with_ffmpeg(self, filename, url):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
args = ['ffmpeg', '-y', '-i', url, '-f', 'mp4', tmpfilename]
# Check for ffmpeg first
try:
subprocess.call(['ffmpeg', '-h'], stdout=(open(os.path.devnull, 'w')), stderr=subprocess.STDOUT)
except (OSError, IOError):
self.report_error(u'm3u8 download detected but "%s" could not be run' % args[0] )
return False
retval = subprocess.call(args)
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen(u'\r[%s] %s bytes' % (args[0], fsize))
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': filename,
'status': 'finished',
})
return True
else:
self.to_stderr(u"\n")
self.report_error(u'ffmpeg exited with code %d' % retval)
return False
def _do_download(self, filename, info_dict):
url = info_dict['url']
@@ -339,6 +401,7 @@ class FileDownloader(object):
self._hook_progress({
'filename': filename,
'status': 'finished',
'total_bytes': os.path.getsize(encodeFilename(filename)),
})
return True
@@ -354,6 +417,10 @@ class FileDownloader(object):
if url.startswith('mms') or url.startswith('rtsp'):
return self._download_with_mplayer(filename, url)
# m3u8 manifest are downloaded with ffmpeg
if determine_ext(url) == u'm3u8':
return self._download_m3u8_with_ffmpeg(filename, url)
tmpfilename = self.temp_name(filename)
stream = None
@@ -481,13 +548,14 @@ class FileDownloader(object):
block_size = self.best_block_size(after - before, len(data_block))
# Progress message
speed_str = self.calc_speed(start, time.time(), byte_counter - resume_len)
speed = self.calc_speed(start, time.time(), byte_counter - resume_len)
if data_len is None:
self.report_progress('Unknown %', data_len_str, speed_str, 'Unknown ETA')
eta = None
else:
percent_str = self.calc_percent(byte_counter, data_len)
eta_str = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent_str, data_len_str, speed_str, eta_str)
percent = self.calc_percent(byte_counter, data_len)
eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
self.report_progress(percent, data_len_str, speed, eta)
self._hook_progress({
'downloaded_bytes': byte_counter,
@@ -495,6 +563,8 @@ class FileDownloader(object):
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'eta': eta,
'speed': speed,
})
# Apply rate limit
@@ -505,7 +575,7 @@ class FileDownloader(object):
self.report_error(u'Did not get any data blocks')
return False
stream.close()
self.report_finish()
self.report_finish(data_len_str, (time.time() - start))
if data_len is not None and byte_counter != data_len:
raise ContentTooShortError(byte_counter, int(data_len))
self.try_rename(tmpfilename, filename)
@@ -537,6 +607,8 @@ class FileDownloader(object):
* downloaded_bytes: Bytes on disks
* total_bytes: Total bytes, None if unknown
* tmpfilename: The filename we're currently writing to
* eta: The estimated time in seconds, None if unknown
* speed: The download speed in bytes/second, None if unknown
Hooks are guaranteed to be called at least once (with status "finished")
if the download is successful.

View File

@@ -3,7 +3,14 @@ import subprocess
import sys
import time
from .utils import *
from .utils import (
compat_subprocess_get_DEVNULL,
encodeFilename,
PostProcessingError,
shell_quote,
subtitles_filename,
)
class PostProcessor(object):
@@ -71,12 +78,19 @@ class FFmpegPostProcessor(PostProcessor):
programs = ['avprobe', 'avconv', 'ffmpeg', 'ffprobe']
return dict((program, executable(program)) for program in programs)
def run_ffmpeg(self, path, out_path, opts):
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):
if not self._exes['ffmpeg'] and not self._exes['avconv']:
raise FFmpegPostProcessorError(u'ffmpeg or avconv not found. Please install one.')
cmd = ([self._exes['avconv'] or self._exes['ffmpeg'], '-y', '-i', encodeFilename(path)]
files_cmd = []
for path in input_paths:
files_cmd.extend(['-i', encodeFilename(path)])
cmd = ([self._exes['avconv'] or self._exes['ffmpeg'], '-y'] + files_cmd
+ opts +
[encodeFilename(self._ffmpeg_filename_argument(out_path))])
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(u'[debug] ffmpeg command line: %s' % shell_quote(cmd))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout,stderr = p.communicate()
if p.returncode != 0:
@@ -84,6 +98,9 @@ class FFmpegPostProcessor(PostProcessor):
msg = stderr.strip().split('\n')[-1]
raise FFmpegPostProcessorError(msg)
def run_ffmpeg(self, path, out_path, opts):
self.run_ffmpeg_multiple_files([path], out_path, opts)
def _ffmpeg_filename_argument(self, fn):
# ffmpeg broke --, see https://ffmpeg.org/trac/ffmpeg/ticket/2127 for details
if fn.startswith(u'-'):
@@ -100,7 +117,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
self._nopostoverwrites = nopostoverwrites
def get_audio_codec(self, path):
if not self._exes['ffprobe'] and not self._exes['avprobe']: return None
if not self._exes['ffprobe'] and not self._exes['avprobe']:
raise PostProcessingError(u'ffprobe or avprobe not found. Please install one.')
try:
cmd = [self._exes['avprobe'] or self._exes['ffprobe'], '-show_streams', encodeFilename(self._ffmpeg_filename_argument(path))]
handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE)
@@ -128,7 +146,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
try:
FFmpegPostProcessor.run_ffmpeg(self, path, out_path, opts)
except FFmpegPostProcessorError as err:
raise AudioConversionError(err.message)
raise AudioConversionError(err.msg)
def run(self, information):
path = information['filepath']
@@ -168,7 +186,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
extension = self._preferredcodec
more_opts = []
if self._preferredquality is not None:
if int(self._preferredquality) < 10:
# The opus codec doesn't support the -aq option
if int(self._preferredquality) < 10 and extension != 'opus':
more_opts += [self._exes['avconv'] and '-q:a' or '-aq', self._preferredquality]
else:
more_opts += [self._exes['avconv'] and '-b:a' or '-ab', self._preferredquality + 'k']
@@ -198,7 +217,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
except:
etype,e,tb = sys.exc_info()
if isinstance(e, AudioConversionError):
msg = u'audio conversion failed: ' + e.message
msg = u'audio conversion failed: ' + e.msg
else:
msg = u'error running ' + (self._exes['avconv'] and 'avconv' or 'ffmpeg')
raise PostProcessingError(msg)
@@ -208,7 +227,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
try:
os.utime(encodeFilename(new_path), (time.time(), information['filetime']))
except:
self._downloader.to_stderr(u'WARNING: Cannot update utime of audio file')
self._downloader.report_warning(u'Cannot update utime of audio file')
information['filepath'] = new_path
return self._nopostoverwrites,information
@@ -231,3 +250,262 @@ class FFmpegVideoConvertor(FFmpegPostProcessor):
information['format'] = self._preferedformat
information['ext'] = self._preferedformat
return False,information
class FFmpegEmbedSubtitlePP(FFmpegPostProcessor):
# See http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
_lang_map = {
'aa': 'aar',
'ab': 'abk',
'ae': 'ave',
'af': 'afr',
'ak': 'aka',
'am': 'amh',
'an': 'arg',
'ar': 'ara',
'as': 'asm',
'av': 'ava',
'ay': 'aym',
'az': 'aze',
'ba': 'bak',
'be': 'bel',
'bg': 'bul',
'bh': 'bih',
'bi': 'bis',
'bm': 'bam',
'bn': 'ben',
'bo': 'bod',
'br': 'bre',
'bs': 'bos',
'ca': 'cat',
'ce': 'che',
'ch': 'cha',
'co': 'cos',
'cr': 'cre',
'cs': 'ces',
'cu': 'chu',
'cv': 'chv',
'cy': 'cym',
'da': 'dan',
'de': 'deu',
'dv': 'div',
'dz': 'dzo',
'ee': 'ewe',
'el': 'ell',
'en': 'eng',
'eo': 'epo',
'es': 'spa',
'et': 'est',
'eu': 'eus',
'fa': 'fas',
'ff': 'ful',
'fi': 'fin',
'fj': 'fij',
'fo': 'fao',
'fr': 'fra',
'fy': 'fry',
'ga': 'gle',
'gd': 'gla',
'gl': 'glg',
'gn': 'grn',
'gu': 'guj',
'gv': 'glv',
'ha': 'hau',
'he': 'heb',
'hi': 'hin',
'ho': 'hmo',
'hr': 'hrv',
'ht': 'hat',
'hu': 'hun',
'hy': 'hye',
'hz': 'her',
'ia': 'ina',
'id': 'ind',
'ie': 'ile',
'ig': 'ibo',
'ii': 'iii',
'ik': 'ipk',
'io': 'ido',
'is': 'isl',
'it': 'ita',
'iu': 'iku',
'ja': 'jpn',
'jv': 'jav',
'ka': 'kat',
'kg': 'kon',
'ki': 'kik',
'kj': 'kua',
'kk': 'kaz',
'kl': 'kal',
'km': 'khm',
'kn': 'kan',
'ko': 'kor',
'kr': 'kau',
'ks': 'kas',
'ku': 'kur',
'kv': 'kom',
'kw': 'cor',
'ky': 'kir',
'la': 'lat',
'lb': 'ltz',
'lg': 'lug',
'li': 'lim',
'ln': 'lin',
'lo': 'lao',
'lt': 'lit',
'lu': 'lub',
'lv': 'lav',
'mg': 'mlg',
'mh': 'mah',
'mi': 'mri',
'mk': 'mkd',
'ml': 'mal',
'mn': 'mon',
'mr': 'mar',
'ms': 'msa',
'mt': 'mlt',
'my': 'mya',
'na': 'nau',
'nb': 'nob',
'nd': 'nde',
'ne': 'nep',
'ng': 'ndo',
'nl': 'nld',
'nn': 'nno',
'no': 'nor',
'nr': 'nbl',
'nv': 'nav',
'ny': 'nya',
'oc': 'oci',
'oj': 'oji',
'om': 'orm',
'or': 'ori',
'os': 'oss',
'pa': 'pan',
'pi': 'pli',
'pl': 'pol',
'ps': 'pus',
'pt': 'por',
'qu': 'que',
'rm': 'roh',
'rn': 'run',
'ro': 'ron',
'ru': 'rus',
'rw': 'kin',
'sa': 'san',
'sc': 'srd',
'sd': 'snd',
'se': 'sme',
'sg': 'sag',
'si': 'sin',
'sk': 'slk',
'sl': 'slv',
'sm': 'smo',
'sn': 'sna',
'so': 'som',
'sq': 'sqi',
'sr': 'srp',
'ss': 'ssw',
'st': 'sot',
'su': 'sun',
'sv': 'swe',
'sw': 'swa',
'ta': 'tam',
'te': 'tel',
'tg': 'tgk',
'th': 'tha',
'ti': 'tir',
'tk': 'tuk',
'tl': 'tgl',
'tn': 'tsn',
'to': 'ton',
'tr': 'tur',
'ts': 'tso',
'tt': 'tat',
'tw': 'twi',
'ty': 'tah',
'ug': 'uig',
'uk': 'ukr',
'ur': 'urd',
'uz': 'uzb',
've': 'ven',
'vi': 'vie',
'vo': 'vol',
'wa': 'wln',
'wo': 'wol',
'xh': 'xho',
'yi': 'yid',
'yo': 'yor',
'za': 'zha',
'zh': 'zho',
'zu': 'zul',
}
def __init__(self, downloader=None, subtitlesformat='srt'):
super(FFmpegEmbedSubtitlePP, self).__init__(downloader)
self._subformat = subtitlesformat
@classmethod
def _conver_lang_code(cls, code):
"""Convert language code from ISO 639-1 to ISO 639-2/T"""
return cls._lang_map.get(code[:2])
def run(self, information):
if information['ext'] != u'mp4':
self._downloader.to_screen(u'[ffmpeg] Subtitles can only be embedded in mp4 files')
return True, information
if not information.get('subtitles'):
self._downloader.to_screen(u'[ffmpeg] There aren\'t any subtitles to embed')
return True, information
sub_langs = [key for key in information['subtitles']]
filename = information['filepath']
input_files = [filename] + [subtitles_filename(filename, lang, self._subformat) for lang in sub_langs]
opts = ['-map', '0:0', '-map', '0:1', '-c:v', 'copy', '-c:a', 'copy']
for (i, lang) in enumerate(sub_langs):
opts.extend(['-map', '%d:0' % (i+1), '-c:s:%d' % i, 'mov_text'])
lang_code = self._conver_lang_code(lang)
if lang_code is not None:
opts.extend(['-metadata:s:s:%d' % i, 'language=%s' % lang_code])
opts.extend(['-f', 'mp4'])
temp_filename = filename + u'.temp'
self._downloader.to_screen(u'[ffmpeg] Embedding subtitles in \'%s\'' % filename)
self.run_ffmpeg_multiple_files(input_files, temp_filename, opts)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, information
class FFmpegMetadataPP(FFmpegPostProcessor):
def run(self, info):
metadata = {}
if info.get('title') is not None:
metadata['title'] = info['title']
if info.get('upload_date') is not None:
metadata['date'] = info['upload_date']
if info.get('uploader') is not None:
metadata['artist'] = info['uploader']
elif info.get('uploader_id') is not None:
metadata['artist'] = info['uploader_id']
if not metadata:
self._downloader.to_screen(u'[ffmpeg] There isn\'t any metadata to add')
return True, info
filename = info['filepath']
ext = os.path.splitext(filename)[1][1:]
temp_filename = filename + u'.temp'
options = ['-c', 'copy']
for (name, value) in metadata.items():
options.extend(['-metadata', '%s="%s"' % (name, value)])
options.extend(['-f', ext])
self._downloader.to_screen(u'[ffmpeg] Adding metadata to \'%s\'' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return True, info

View File

@@ -3,6 +3,7 @@
from __future__ import absolute_import
import errno
import io
import os
import re
@@ -70,16 +71,26 @@ class YoutubeDL(object):
logtostderr: Log messages to stderr instead of stdout.
writedescription: Write the video description to a .description file
writeinfojson: Write the video description to a .info.json file
writeannotations: Write the video annotations to a .annotations.xml file
writethumbnail: Write the thumbnail image to a file
writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatic subtitles to a file
allsubtitles: Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
listsubtitles: Lists all available subtitles for the video
subtitlesformat: Subtitle format [srt/sbv/vtt] (default=srt)
subtitleslang: Language of the subtitles to download
subtitleslangs: List of languages of the subtitles to download
keepvideo: Keep the video file after post-processing
daterange: A DateRange object, download only if the upload_date is in the range.
skip_download: Skip the actual download of the video file
cachedir: Location of the cache files in the filesystem.
None to disable filesystem cache.
noplaylist: Download single video instead of a playlist if in doubt.
age_limit: An integer representing the user's age in years.
Unsuitable videos for the given age are skipped.
downloadarchive: File name of a file where all downloads are recorded.
Videos already present in the file are not downloaded
again.
The following parameters are not used by YoutubeDL itself, they are used by
the FileDownloader:
@@ -97,11 +108,23 @@ class YoutubeDL(object):
def __init__(self, params):
"""Create a FileDownloader object with the given options."""
self._ies = []
self._ies_instances = {}
self._pps = []
self._progress_hooks = []
self._download_retcode = 0
self._num_downloads = 0
self._screen_file = [sys.stdout, sys.stderr][params.get('logtostderr', False)]
if (sys.version_info >= (3,) and sys.platform != 'win32' and
sys.getfilesystemencoding() in ['ascii', 'ANSI_X3.4-1968']
and not params['restrictfilenames']):
# On Python 3, the Unicode filesystem API will throw errors (#1474)
self.report_warning(
u'Assuming --restrict-filenames since file system encoding '
u'cannot encode all charactes. '
u'Set the LC_ALL environment variable to fix this.')
params['restrictfilenames'] = True
self.params = params
self.fd = FileDownloader(self, self.params)
@@ -111,8 +134,21 @@ class YoutubeDL(object):
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
def get_info_extractor(self, ie_key):
"""
Get an instance of an IE with name ie_key, it will try to get one from
the _ies list, if there's no instance it will create a new one and add
it to the extractor list.
"""
ie = self._ies_instances.get(ie_key)
if ie is None:
ie = get_info_extractor(ie_key)()
self.add_info_extractor(ie)
return ie
def add_default_info_extractors(self):
"""
Add the InfoExtractors returned by gen_extractors to the end of the list
@@ -127,14 +163,10 @@ class YoutubeDL(object):
def to_screen(self, message, skip_eol=False):
"""Print message to stdout if not in quiet mode."""
assert type(message) == type(u'')
if not self.params.get('quiet', False):
terminator = [u'\n', u''][skip_eol]
output = message + terminator
if 'b' in getattr(self._screen_file, 'mode', '') or sys.version_info[0] < 3: # Python 2 lies about the mode of sys.stdout/sys.stderr
output = output.encode(preferredencoding(), 'ignore')
self._screen_file.write(output)
self._screen_file.flush()
write_string(output, self._screen_file)
def to_stderr(self, message):
"""Print message to stderr."""
@@ -227,6 +259,10 @@ class YoutubeDL(object):
""" Report that the metadata file has been written """
self.to_screen(u'[info] Video description metadata as JSON to: ' + infofn)
def report_writeannotations(self, annofn):
""" Report that the annotations file has been written. """
self.to_screen(u'[info] Writing video annotations to: ' + annofn)
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""
try:
@@ -264,7 +300,7 @@ class YoutubeDL(object):
self.report_error(u'Erroneous output template')
return None
except ValueError as err:
self.report_error(u'Insufficient system charset ' + repr(preferredencoding()))
self.report_error(u'Error in output template: ' + str(err) + u' (encoding: ' + repr(preferredencoding()) + ')')
return None
def _match_entry(self, info_dict):
@@ -284,6 +320,13 @@ class YoutubeDL(object):
dateRange = self.params.get('daterange', DateRange())
if date not in dateRange:
return u'[download] %s upload date is not in range %s' % (date_from_str(date).isoformat(), dateRange)
age_limit = self.params.get('age_limit')
if age_limit is not None:
if age_limit < info_dict.get('age_limit', 0):
return u'Skipping "' + title + '" because it is age restricted'
if self.in_download_archive(info_dict):
return (u'%(title)s has already been recorded in archive'
% info_dict)
return None
def extract_info(self, url, download=True, ie_key=None, extra_info={}):
@@ -294,9 +337,7 @@ class YoutubeDL(object):
'''
if ie_key:
ie = get_info_extractor(ie_key)()
ie.set_downloader(self)
ies = [ie]
ies = [self.get_info_extractor(ie_key)]
else:
ies = self._ies
@@ -448,7 +489,8 @@ class YoutubeDL(object):
if self.params.get('forceid', False):
compat_print(info_dict['id'])
if self.params.get('forceurl', False):
compat_print(info_dict['url'])
# For RTMP URLs, also include the playpath
compat_print(info_dict['url'] + info_dict.get('play_path', u''))
if self.params.get('forcethumbnail', False) and 'thumbnail' in info_dict:
compat_print(info_dict['thumbnail'])
if self.params.get('forcedescription', False) and 'description' in info_dict:
@@ -479,45 +521,45 @@ class YoutubeDL(object):
self.report_writedescription(descfn)
with io.open(encodeFilename(descfn), 'w', encoding='utf-8') as descfile:
descfile.write(info_dict['description'])
except (KeyError, TypeError):
self.report_warning(u'There\'s no description to write.')
except (OSError, IOError):
self.report_error(u'Cannot write description file ' + descfn)
return
if (self.params.get('writesubtitles', False) or self.params.get('writeautomaticsub')) and 'subtitles' in info_dict and info_dict['subtitles']:
if self.params.get('writeannotations', False):
try:
annofn = filename + u'.annotations.xml'
self.report_writeannotations(annofn)
with io.open(encodeFilename(annofn), 'w', encoding='utf-8') as annofile:
annofile.write(info_dict['annotations'])
except (KeyError, TypeError):
self.report_warning(u'There are no annotations to write.')
except (OSError, IOError):
self.report_error(u'Cannot write annotations file: ' + annofn)
return
subtitles_are_requested = any([self.params.get('writesubtitles', False),
self.params.get('writeautomaticsub')])
if subtitles_are_requested and 'subtitles' in info_dict and info_dict['subtitles']:
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
subtitle = info_dict['subtitles'][0]
(sub_error, sub_lang, sub) = subtitle
subtitles = info_dict['subtitles']
sub_format = self.params.get('subtitlesformat')
if sub_error:
self.report_warning("Some error while getting the subtitles")
else:
for sub_lang in subtitles.keys():
sub = subtitles[sub_lang]
if sub is None:
continue
try:
sub_filename = filename.rsplit('.', 1)[0] + u'.' + sub_lang + u'.' + sub_format
sub_filename = subtitles_filename(filename, sub_lang, sub_format)
self.report_writesubtitles(sub_filename)
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
subfile.write(sub)
subfile.write(sub)
except (OSError, IOError):
self.report_error(u'Cannot write subtitles file ' + descfn)
return
if self.params.get('allsubtitles', False) and 'subtitles' in info_dict and info_dict['subtitles']:
subtitles = info_dict['subtitles']
sub_format = self.params.get('subtitlesformat')
for subtitle in subtitles:
(sub_error, sub_lang, sub) = subtitle
if sub_error:
self.report_warning("Some error while getting the subtitles")
else:
try:
sub_filename = filename.rsplit('.', 1)[0] + u'.' + sub_lang + u'.' + sub_format
self.report_writesubtitles(sub_filename)
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
subfile.write(sub)
except (OSError, IOError):
self.report_error(u'Cannot write subtitles file ' + descfn)
return
if self.params.get('writeinfojson', False):
infofn = filename + u'.info.json'
self.report_writeinfojson(infofn)
@@ -534,11 +576,15 @@ class YoutubeDL(object):
thumb_filename = filename.rpartition('.')[0] + u'.' + thumb_format
self.to_screen(u'[%s] %s: Downloading thumbnail ...' %
(info_dict['extractor'], info_dict['id']))
uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
with open(thumb_filename, 'wb') as thumbf:
shutil.copyfileobj(uf, thumbf)
self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
(info_dict['extractor'], info_dict['id'], thumb_filename))
try:
uf = compat_urllib_request.urlopen(info_dict['thumbnail'])
with open(thumb_filename, 'wb') as thumbf:
shutil.copyfileobj(uf, thumbf)
self.to_screen(u'[%s] %s: Writing thumbnail to: %s' %
(info_dict['extractor'], info_dict['id'], thumb_filename))
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_warning(u'Unable to download thumbnail "%s": %s' %
(info_dict['thumbnail'], compat_str(err)))
if not self.params.get('skip_download', False):
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(filename)):
@@ -546,11 +592,11 @@ class YoutubeDL(object):
else:
try:
success = self.fd._do_download(filename, info_dict)
except (OSError, IOError) as err:
raise UnavailableVideoError()
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_error(u'unable to download video data: %s' % str(err))
return
except (OSError, IOError) as err:
raise UnavailableVideoError(err)
except (ContentTooShortError, ) as err:
self.report_error(u'content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
return
@@ -562,6 +608,8 @@ class YoutubeDL(object):
self.report_error(u'postprocessing: %s' % str(err))
return
self.record_download_archive(info_dict)
def download(self, url_list):
"""Download a given list of URLs."""
if len(url_list) > 1 and self.fixed_template():
@@ -594,10 +642,33 @@ class YoutubeDL(object):
# No clear decision yet, let IE decide
keep_video = keep_video_wish
except PostProcessingError as e:
self.to_stderr(u'ERROR: ' + e.msg)
self.report_error(e.msg)
if keep_video is False and not self.params.get('keepvideo', False):
try:
self.to_screen(u'Deleting original file %s (pass -k to keep)' % filename)
os.remove(encodeFilename(filename))
except (IOError, OSError):
self.report_warning(u'Unable to remove downloaded video file')
def in_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return False
vid_id = info_dict['extractor'] + u' ' + info_dict['id']
try:
with locked_file(fn, 'r', encoding='utf-8') as archive_file:
for line in archive_file:
if line.strip() == vid_id:
return True
except IOError as ioe:
if ioe.errno != errno.ENOENT:
raise
return False
def record_download_archive(self, info_dict):
fn = self.params.get('download_archive')
if fn is None:
return
vid_id = info_dict['extractor'] + u' ' + info_dict['id']
with locked_file(fn, 'a', encoding='utf-8') as archive_file:
archive_file.write(vid_id + u'\n')

View File

@@ -27,11 +27,17 @@ __authors__ = (
'Johny Mo Swag',
'Axel Noack',
'Albert Kim',
'Pierre Rudloff',
'Huarong Huo',
'Ismael Mejía',
'Steffan \'Ruirize\' James',
'Andras Elso',
)
__license__ = 'Public Domain'
import codecs
import collections
import getpass
import optparse
import os
@@ -41,16 +47,43 @@ import shlex
import socket
import subprocess
import sys
import warnings
import traceback
import platform
from .utils import *
from .utils import (
compat_cookiejar,
compat_print,
compat_str,
compat_urllib_request,
DateRange,
decodeOption,
determine_ext,
DownloadError,
get_cachedir,
make_HTTPS_handler,
MaxDownloadsReached,
platform_name,
preferredencoding,
SameFileError,
std_headers,
write_string,
YoutubeDLHandler,
)
from .update import update_self
from .version import __version__
from .FileDownloader import *
from .FileDownloader import (
FileDownloader,
)
from .extractor import gen_extractors
from .YoutubeDL import YoutubeDL
from .PostProcessor import *
from .PostProcessor import (
FFmpegMetadataPP,
FFmpegVideoConvertor,
FFmpegExtractAudioPP,
FFmpegEmbedSubtitlePP,
)
def parseOpts(overrideArguments=None):
def _readOptions(filename_bytes):
@@ -82,6 +115,9 @@ def parseOpts(overrideArguments=None):
return "".join(opts)
def _comma_separated_values_options_callback(option, opt_str, value, parser):
setattr(parser.values, option.dest, value.split(','))
def _find_term_columns():
columns = os.environ.get('COLUMNS', None)
if columns:
@@ -95,6 +131,16 @@ def parseOpts(overrideArguments=None):
pass
return None
def _hide_login_info(opts):
opts = list(opts)
for private_opt in ['-p', '--password', '-u', '--username']:
try:
i = opts.index(private_opt)
opts[i+1] = '<PRIVATE>'
except ValueError:
pass
return opts
max_width = 80
max_help_position = 80
@@ -119,6 +165,7 @@ def parseOpts(overrideArguments=None):
selection = optparse.OptionGroup(parser, 'Video Selection')
authentication = optparse.OptionGroup(parser, 'Authentication Options')
video_format = optparse.OptionGroup(parser, 'Video Format Options')
subtitles = optparse.OptionGroup(parser, 'Subtitle Options')
downloader = optparse.OptionGroup(parser, 'Download Options')
postproc = optparse.OptionGroup(parser, 'Post-processing Options')
filesystem = optparse.OptionGroup(parser, 'Filesystem Options')
@@ -131,7 +178,7 @@ def parseOpts(overrideArguments=None):
general.add_option('-U', '--update',
action='store_true', dest='update_self', help='update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed)')
general.add_option('-i', '--ignore-errors',
action='store_true', dest='ignoreerrors', help='continue on download errors', default=False)
action='store_true', dest='ignoreerrors', help='continue on download errors, for example to to skip unavailable videos in a playlist', default=False)
general.add_option('--dump-user-agent',
action='store_true', dest='dump_user_agent',
help='display the current browser identification', default=False)
@@ -148,6 +195,12 @@ def parseOpts(overrideArguments=None):
help='Output descriptions of all supported extractors', default=False)
general.add_option('--proxy', dest='proxy', default=None, help='Use the specified HTTP/HTTPS proxy', metavar='URL')
general.add_option('--no-check-certificate', action='store_true', dest='no_check_certificate', default=False, help='Suppress HTTPS certificate validation.')
general.add_option(
'--cache-dir', dest='cachedir', default=get_cachedir(),
help='Location in the filesystem where youtube-dl can store downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl .')
general.add_option(
'--no-cache-dir', action='store_const', const=None, dest='cachedir',
help='Disable filesystem caching')
selection.add_option('--playlist-start',
@@ -162,6 +215,13 @@ def parseOpts(overrideArguments=None):
selection.add_option('--date', metavar='DATE', dest='date', help='download only videos uploaded in this date', default=None)
selection.add_option('--datebefore', metavar='DATE', dest='datebefore', help='download only videos uploaded before this date', default=None)
selection.add_option('--dateafter', metavar='DATE', dest='dateafter', help='download only videos uploaded after this date', default=None)
selection.add_option('--no-playlist', action='store_true', dest='noplaylist', help='download only the currently playing video', default=False)
selection.add_option('--age-limit', metavar='YEARS', dest='age_limit',
help='download only videos suitable for the given age',
default=None, type=int)
selection.add_option('--download-archive', metavar='FILE',
dest='download_archive',
help='Download only videos not present in the archive file. Record all downloaded videos in it.')
authentication.add_option('-u', '--username',
@@ -176,7 +236,7 @@ def parseOpts(overrideArguments=None):
video_format.add_option('-f', '--format',
action='store', dest='format', metavar='FORMAT',
help='video format code, specifiy the order of preference using slashes: "-f 22/17/18"')
help='video format code, specifiy the order of preference using slashes: "-f 22/17/18". "-f mp4" and "-f flv" are also supported')
video_format.add_option('--all-formats',
action='store_const', dest='format', help='download all available video formats', const='all')
video_format.add_option('--prefer-free-formats',
@@ -185,34 +245,33 @@ def parseOpts(overrideArguments=None):
action='store', dest='format_limit', metavar='FORMAT', help='highest quality format to download')
video_format.add_option('-F', '--list-formats',
action='store_true', dest='listformats', help='list all available formats (currently youtube only)')
video_format.add_option('--write-sub', '--write-srt',
subtitles.add_option('--write-sub', '--write-srt',
action='store_true', dest='writesubtitles',
help='write subtitle file (currently youtube only)', default=False)
video_format.add_option('--write-auto-sub', '--write-automatic-sub',
help='write subtitle file', default=False)
subtitles.add_option('--write-auto-sub', '--write-automatic-sub',
action='store_true', dest='writeautomaticsub',
help='write automatic subtitle file (currently youtube only)', default=False)
video_format.add_option('--only-sub',
action='store_true', dest='skip_download',
help='[deprecated] alias of --skip-download', default=False)
video_format.add_option('--all-subs',
help='write automatic subtitle file (youtube only)', default=False)
subtitles.add_option('--all-subs',
action='store_true', dest='allsubtitles',
help='downloads all the available subtitles of the video (currently youtube only)', default=False)
video_format.add_option('--list-subs',
help='downloads all the available subtitles of the video', default=False)
subtitles.add_option('--list-subs',
action='store_true', dest='listsubtitles',
help='lists all available subtitles for the video (currently youtube only)', default=False)
video_format.add_option('--sub-format',
help='lists all available subtitles for the video', default=False)
subtitles.add_option('--sub-format',
action='store', dest='subtitlesformat', metavar='FORMAT',
help='subtitle format [srt/sbv/vtt] (default=srt) (currently youtube only)', default='srt')
video_format.add_option('--sub-lang', '--srt-lang',
action='store', dest='subtitleslang', metavar='LANG',
help='language of the subtitles to download (optional) use IETF language tags like \'en\'')
help='subtitle format (default=srt) ([sbv/vtt] youtube only)', default='srt')
subtitles.add_option('--sub-lang', '--sub-langs', '--srt-lang',
action='callback', dest='subtitleslangs', metavar='LANGS', type='str',
default=[], callback=_comma_separated_values_options_callback,
help='languages of the subtitles to download (optional) separated by commas, use IETF language tags like \'en,pt\'')
downloader.add_option('-r', '--rate-limit',
dest='ratelimit', metavar='LIMIT', help='maximum download rate (e.g. 50k or 44.6m)')
dest='ratelimit', metavar='LIMIT', help='maximum download rate in bytes per second (e.g. 50K or 4.2M)')
downloader.add_option('-R', '--retries',
dest='retries', metavar='RETRIES', help='number of retries (default is %default)', default=10)
downloader.add_option('--buffer-size',
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16k) (default is %default)', default="1024")
dest='buffersize', metavar='SIZE', help='size of download buffer (e.g. 1024 or 16K) (default is %default)', default="1024")
downloader.add_option('--no-resize-buffer',
action='store_true', dest='noresizebuffer',
help='do not automatically adjust the buffer size. By default, the buffer size is automatically resized from an initial value of SIZE.', default=False)
@@ -254,6 +313,10 @@ def parseOpts(overrideArguments=None):
verbosity.add_option('--dump-intermediate-pages',
action='store_true', dest='dump_intermediate_pages', default=False,
help='print downloaded pages to debug problems(very verbose)')
verbosity.add_option('--youtube-print-sig-code',
action='store_true', dest='youtube_print_sig_code', default=False,
help=optparse.SUPPRESS_HELP)
filesystem.add_option('-t', '--title',
action='store_true', dest='usetitle', help='use title in file name (default)', default=False)
@@ -303,6 +366,9 @@ def parseOpts(overrideArguments=None):
filesystem.add_option('--write-info-json',
action='store_true', dest='writeinfojson',
help='write video metadata to a .info.json file', default=False)
filesystem.add_option('--write-annotations',
action='store_true', dest='writeannotations',
help='write video annotations to a .annotation file', default=False)
filesystem.add_option('--write-thumbnail',
action='store_true', dest='writethumbnail',
help='write thumbnail image to disk', default=False)
@@ -320,6 +386,10 @@ def parseOpts(overrideArguments=None):
help='keeps the video file on disk after the post-processing; the video is erased by default')
postproc.add_option('--no-post-overwrites', action='store_true', dest='nopostoverwrites', default=False,
help='do not overwrite post-processed files; the post-processed files are overwritten by default')
postproc.add_option('--embed-subs', action='store_true', dest='embedsubtitles', default=False,
help='embed subtitles in the video (only for mp4 videos)')
postproc.add_option('--add-metadata', action='store_true', dest='addmetadata', default=False,
help='add metadata to the files')
parser.add_option_group(general)
@@ -328,28 +398,33 @@ def parseOpts(overrideArguments=None):
parser.add_option_group(filesystem)
parser.add_option_group(verbosity)
parser.add_option_group(video_format)
parser.add_option_group(subtitles)
parser.add_option_group(authentication)
parser.add_option_group(postproc)
if overrideArguments is not None:
opts, args = parser.parse_args(overrideArguments)
if opts.verbose:
sys.stderr.write(u'[debug] Override config: ' + repr(overrideArguments) + '\n')
write_string(u'[debug] Override config: ' + repr(overrideArguments) + '\n')
else:
xdg_config_home = os.environ.get('XDG_CONFIG_HOME')
if xdg_config_home:
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
userConfFile = os.path.join(xdg_config_home, 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
else:
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl', 'config')
if not os.path.isfile(userConfFile):
userConfFile = os.path.join(os.path.expanduser('~'), '.config', 'youtube-dl.conf')
systemConf = _readOptions('/etc/youtube-dl.conf')
userConf = _readOptions(userConfFile)
commandLineConf = sys.argv[1:]
commandLineConf = sys.argv[1:]
argv = systemConf + userConf + commandLineConf
opts, args = parser.parse_args(argv)
if opts.verbose:
sys.stderr.write(u'[debug] System config: ' + repr(systemConf) + '\n')
sys.stderr.write(u'[debug] User config: ' + repr(userConf) + '\n')
sys.stderr.write(u'[debug] Command-line args: ' + repr(commandLineConf) + '\n')
write_string(u'[debug] System config: ' + repr(_hide_login_info(systemConf)) + '\n')
write_string(u'[debug] User config: ' + repr(_hide_login_info(userConf)) + '\n')
write_string(u'[debug] Command-line args: ' + repr(_hide_login_info(commandLineConf)) + '\n')
return parser, opts, args
@@ -372,12 +447,12 @@ def _real_main(argv=None):
except (IOError, OSError) as err:
if opts.verbose:
traceback.print_exc()
sys.stderr.write(u'ERROR: unable to open cookie file\n')
write_string(u'ERROR: unable to open cookie file\n')
sys.exit(101)
# Set user agent
if opts.user_agent is not None:
std_headers['User-Agent'] = opts.user_agent
# Set referer
if opts.referer is not None:
std_headers['Referer'] = opts.referer
@@ -398,28 +473,14 @@ def _real_main(argv=None):
batchurls = batchfd.readlines()
batchurls = [x.strip() for x in batchurls]
batchurls = [x for x in batchurls if len(x) > 0 and not re.search(r'^[#/;]', x)]
if opts.verbose:
write_string(u'[debug] Batch file urls: ' + repr(batchurls) + u'\n')
except IOError:
sys.exit(u'ERROR: batch file could not be read')
all_urls = batchurls + args
all_urls = [url.strip() for url in all_urls]
# General configuration
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
if opts.proxy is not None:
if opts.proxy == '':
proxies = {}
else:
proxies = {'http': opts.proxy, 'https': opts.proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(opts)
opener = compat_urllib_request.build_opener(https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(300) # 5 minutes should be enough (famous last words)
opener = _setup_opener(jar=jar, opts=opts)
extractors = gen_extractors()
@@ -436,6 +497,8 @@ def _real_main(argv=None):
if not ie._WORKING:
continue
desc = getattr(ie, 'IE_DESC', ie.IE_NAME)
if desc is False:
continue
if hasattr(ie, 'SEARCH_KEY'):
_SEARCHES = (u'cute kittens', u'slithering pythons', u'falling cat', u'angry poodle', u'purple fish', u'running tortoise')
_COUNTS = (u'', u'5', u'10', u'all')
@@ -507,6 +570,11 @@ def _real_main(argv=None):
else:
date = DateRange(opts.dateafter, opts.datebefore)
# --all-sub automatically sets --write-sub if --write-auto-sub is not given
# this was the old behaviour if only --all-sub was given.
if opts.allsubtitles and (opts.writeautomaticsub == False):
opts.writesubtitles = True
if sys.version_info < (3,):
# In Python 2, sys.argv is a bytestring (also note http://bugs.python.org/issue2128 for Windows systems)
if opts.outtmpl is not None:
@@ -519,6 +587,10 @@ def _real_main(argv=None):
or (opts.useid and u'%(id)s.%(ext)s')
or (opts.autonumber and u'%(autonumber)s-%(id)s.%(ext)s')
or u'%(title)s-%(id)s.%(ext)s')
if '%(ext)s' not in outtmpl and opts.extractaudio:
parser.error(u'Cannot download a video and extract audio into the same'
u' file! Use "%%(ext)s" instead of %r' %
determine_ext(outtmpl, u''))
# YoutubeDL
ydl = YoutubeDL({
@@ -553,11 +625,13 @@ def _real_main(argv=None):
'progress_with_newline': opts.progress_with_newline,
'playliststart': opts.playliststart,
'playlistend': opts.playlistend,
'noplaylist': opts.noplaylist,
'logtostderr': opts.outtmpl == '-',
'consoletitle': opts.consoletitle,
'nopart': opts.nopart,
'updatetime': opts.updatetime,
'writedescription': opts.writedescription,
'writeannotations': opts.writeannotations,
'writeinfojson': opts.writeinfojson,
'writethumbnail': opts.writethumbnail,
'writesubtitles': opts.writesubtitles,
@@ -565,7 +639,7 @@ def _real_main(argv=None):
'allsubtitles': opts.allsubtitles,
'listsubtitles': opts.listsubtitles,
'subtitlesformat': opts.subtitlesformat,
'subtitleslang': opts.subtitleslang,
'subtitleslangs': opts.subtitleslangs,
'matchtitle': decodeOption(opts.matchtitle),
'rejecttitle': decodeOption(opts.rejecttitle),
'max_downloads': opts.max_downloads,
@@ -577,10 +651,14 @@ def _real_main(argv=None):
'min_filesize': opts.min_filesize,
'max_filesize': opts.max_filesize,
'daterange': date,
'cachedir': opts.cachedir,
'youtube_print_sig_code': opts.youtube_print_sig_code,
'age_limit': opts.age_limit,
'download_archive': opts.download_archive,
})
if opts.verbose:
sys.stderr.write(u'[debug] youtube-dl version ' + __version__ + u'\n')
write_string(u'[debug] youtube-dl version ' + __version__ + u'\n')
try:
sp = subprocess.Popen(
['git', 'rev-parse', '--short', 'HEAD'],
@@ -589,26 +667,36 @@ def _real_main(argv=None):
out, err = sp.communicate()
out = out.decode().strip()
if re.match('[0-9a-f]+', out):
sys.stderr.write(u'[debug] Git HEAD: ' + out + u'\n')
write_string(u'[debug] Git HEAD: ' + out + u'\n')
except:
try:
sys.exc_clear()
except:
pass
sys.stderr.write(u'[debug] Python version %s - %s' %(platform.python_version(), platform.platform()) + u'\n')
sys.stderr.write(u'[debug] Proxy map: ' + str(proxy_handler.proxies) + u'\n')
write_string(u'[debug] Python version %s - %s' %(platform.python_version(), platform_name()) + u'\n')
proxy_map = {}
for handler in opener.handlers:
if hasattr(handler, 'proxies'):
proxy_map.update(handler.proxies)
write_string(u'[debug] Proxy map: ' + compat_str(proxy_map) + u'\n')
ydl.add_default_info_extractors()
# PostProcessors
# Add the metadata pp first, the other pps will copy it
if opts.addmetadata:
ydl.add_post_processor(FFmpegMetadataPP())
if opts.extractaudio:
ydl.add_post_processor(FFmpegExtractAudioPP(preferredcodec=opts.audioformat, preferredquality=opts.audioquality, nopostoverwrites=opts.nopostoverwrites))
if opts.recodevideo:
ydl.add_post_processor(FFmpegVideoConvertor(preferedformat=opts.recodevideo))
if opts.embedsubtitles:
ydl.add_post_processor(FFmpegEmbedSubtitlePP(subtitlesformat=opts.subtitlesformat))
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose, sys.argv[0])
update_self(ydl.to_screen, opts.verbose)
# Maybe do nothing
if len(all_urls) < 1:
@@ -627,11 +715,42 @@ def _real_main(argv=None):
if opts.cookiefile is not None:
try:
jar.save()
except (IOError, OSError) as err:
except (IOError, OSError):
sys.exit(u'ERROR: unable to save cookie jar')
sys.exit(retcode)
def _setup_opener(jar=None, opts=None, timeout=300):
if opts is None:
FakeOptions = collections.namedtuple(
'FakeOptions', ['proxy', 'no_check_certificate'])
opts = FakeOptions(proxy=None, no_check_certificate=False)
cookie_processor = compat_urllib_request.HTTPCookieProcessor(jar)
if opts.proxy is not None:
if opts.proxy == '':
proxies = {}
else:
proxies = {'http': opts.proxy, 'https': opts.proxy}
else:
proxies = compat_urllib_request.getproxies()
# Set HTTPS proxy to HTTP one if given (https://github.com/rg3/youtube-dl/issues/805)
if 'http' in proxies and 'https' not in proxies:
proxies['https'] = proxies['http']
proxy_handler = compat_urllib_request.ProxyHandler(proxies)
https_handler = make_HTTPS_handler(opts)
opener = compat_urllib_request.build_opener(
https_handler, proxy_handler, cookie_processor, YoutubeDLHandler())
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
# (See https://github.com/rg3/youtube-dl/issues/1309 for details)
opener.addheaders = []
compat_urllib_request.install_opener(opener)
socket.setdefaulttimeout(timeout)
return opener
def main(argv=None):
try:
_real_main(argv)

202
youtube_dl/aes.py Normal file
View File

@@ -0,0 +1,202 @@
__all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_decrypt_text']
import base64
from math import ceil
from .utils import bytes_to_intlist, intlist_to_bytes
BLOCK_SIZE_BYTES = 16
def aes_ctr_decrypt(data, key, counter):
"""
Decrypt with aes in counter mode
@param {int[]} data cipher
@param {int[]} key 16/24/32-Byte cipher key
@param {instance} counter Instance whose next_value function (@returns {int[]} 16-Byte block)
returns the next counter block
@returns {int[]} decrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
decrypted_data=[]
for i in range(block_count):
counter_block = counter.next_value()
block = data[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES]
block += [0]*(BLOCK_SIZE_BYTES - len(block))
cipher_counter_block = aes_encrypt(counter_block, expanded_key)
decrypted_data += xor(block, cipher_counter_block)
decrypted_data = decrypted_data[:len(data)]
return decrypted_data
def key_expansion(data):
"""
Generate key schedule
@param {int[]} data 16/24/32-Byte cipher key
@returns {int[]} 176/208/240-Byte expanded key
"""
data = data[:] # copy
rcon_iteration = 1
key_size_bytes = len(data)
expanded_key_size_bytes = (key_size_bytes // 4 + 7) * BLOCK_SIZE_BYTES
while len(data) < expanded_key_size_bytes:
temp = data[-4:]
temp = key_schedule_core(temp, rcon_iteration)
rcon_iteration += 1
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
for _ in range(3):
temp = data[-4:]
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
if key_size_bytes == 32:
temp = data[-4:]
temp = sub_bytes(temp)
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
for _ in range(3 if key_size_bytes == 32 else 2 if key_size_bytes == 24 else 0):
temp = data[-4:]
data += xor(temp, data[-key_size_bytes : 4-key_size_bytes])
data = data[:expanded_key_size_bytes]
return data
def aes_encrypt(data, expanded_key):
"""
Encrypt one block with aes
@param {int[]} data 16-Byte state
@param {int[]} expanded_key 176/208/240-Byte expanded key
@returns {int[]} 16-Byte cipher
"""
rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1
data = xor(data, expanded_key[:BLOCK_SIZE_BYTES])
for i in range(1, rounds+1):
data = sub_bytes(data)
data = shift_rows(data)
if i != rounds:
data = mix_columns(data)
data = xor(data, expanded_key[i*BLOCK_SIZE_BYTES : (i+1)*BLOCK_SIZE_BYTES])
return data
def aes_decrypt_text(data, password, key_size_bytes):
"""
Decrypt text
- The first 8 Bytes of decoded 'data' are the 8 high Bytes of the counter
- The cipher key is retrieved by encrypting the first 16 Byte of 'password'
with the first 'key_size_bytes' Bytes from 'password' (if necessary filled with 0's)
- Mode of operation is 'counter'
@param {str} data Base64 encoded string
@param {str,unicode} password Password (will be encoded with utf-8)
@param {int} key_size_bytes Possible values: 16 for 128-Bit, 24 for 192-Bit or 32 for 256-Bit
@returns {str} Decrypted data
"""
NONCE_LENGTH_BYTES = 8
data = bytes_to_intlist(base64.b64decode(data))
password = bytes_to_intlist(password.encode('utf-8'))
key = password[:key_size_bytes] + [0]*(key_size_bytes - len(password))
key = aes_encrypt(key[:BLOCK_SIZE_BYTES], key_expansion(key)) * (key_size_bytes // BLOCK_SIZE_BYTES)
nonce = data[:NONCE_LENGTH_BYTES]
cipher = data[NONCE_LENGTH_BYTES:]
class Counter:
__value = nonce + [0]*(BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES)
def next_value(self):
temp = self.__value
self.__value = inc(self.__value)
return temp
decrypted_data = aes_ctr_decrypt(cipher, key, Counter())
plaintext = intlist_to_bytes(decrypted_data)
return plaintext
RCON = (0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36)
SBOX = (0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16)
MIX_COLUMN_MATRIX = ((2,3,1,1),
(1,2,3,1),
(1,1,2,3),
(3,1,1,2))
def sub_bytes(data):
return [SBOX[x] for x in data]
def rotate(data):
return data[1:] + [data[0]]
def key_schedule_core(data, rcon_iteration):
data = rotate(data)
data = sub_bytes(data)
data[0] = data[0] ^ RCON[rcon_iteration]
return data
def xor(data1, data2):
return [x^y for x, y in zip(data1, data2)]
def mix_column(data):
data_mixed = []
for row in range(4):
mixed = 0
for column in range(4):
addend = data[column]
if MIX_COLUMN_MATRIX[row][column] in (2,3):
addend <<= 1
if addend > 0xff:
addend &= 0xff
addend ^= 0x1b
if MIX_COLUMN_MATRIX[row][column] == 3:
addend ^= data[column]
mixed ^= addend & 0xff
data_mixed.append(mixed)
return data_mixed
def mix_columns(data):
data_mixed = []
for i in range(4):
column = data[i*4 : (i+1)*4]
data_mixed += mix_column(column)
return data_mixed
def shift_rows(data):
data_shifted = []
for column in range(4):
for row in range(4):
data_shifted.append( data[((column + row) & 0b11) * 4 + row] )
return data_shifted
def inc(data):
data = data[:] # copy
for i in range(len(data)-1,-1,-1):
if data[i] == 255:
data[i] = 0
else:
data[i] = data[i] + 1
break
return data

View File

@@ -1,27 +1,57 @@
from .appletrailers import AppleTrailersIE
from .addanime import AddAnimeIE
from .archiveorg import ArchiveOrgIE
from .ard import ARDIE
from .arte import ArteTvIE
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVFutureIE,
)
from .auengine import AUEngineIE
from .bandcamp import BandcampIE
from .bliptv import BlipTVIE, BlipTVUserIE
from .bloomberg import BloombergIE
from .breakcom import BreakIE
from .brightcove import BrightcoveIE
from .c56 import C56IE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .cinemassacre import CinemassacreIE
from .cnn import CNNIE
from .collegehumor import CollegeHumorIE
from .comedycentral import ComedyCentralIE
from .condenast import CondeNastIE
from .criterion import CriterionIE
from .cspan import CSpanIE
from .dailymotion import DailymotionIE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
DailymotionUserIE,
)
from .daum import DaumIE
from .depositfiles import DepositFilesIE
from .dotsub import DotsubIE
from .dreisat import DreiSatIE
from .defense import DefenseGouvFrIE
from .ebaumsworld import EbaumsWorldIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .escapist import EscapistIE
from .exfm import ExfmIE
from .facebook import FacebookIE
from .faz import FazIE
from .fktv import (
FKTVIE,
FKTVPosteckeIE,
)
from .flickr import FlickrIE
from .francetv import (
PluzzIE,
FranceTvInfoIE,
France2IE,
GenerationQuoiIE
)
from .freesound import FreesoundIE
from .funnyordie import FunnyOrDieIE
from .gamespot import GameSpotIE
@@ -29,6 +59,7 @@ from .gametrailers import GametrailersIE
from .generic import GenericIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .hark import HarkIE
from .hotnewhiphop import HotNewHipHopIE
from .howcast import HowcastIE
from .hypem import HypemIE
@@ -36,44 +67,76 @@ from .ign import IGNIE, OneUPIE
from .ina import InaIE
from .infoq import InfoQIE
from .instagram import InstagramIE
from .internetvideoarchive import InternetVideoArchiveIE
from .jeuxvideo import JeuxVideoIE
from .jukebox import JukeboxIE
from .justintv import JustinTVIE
from .kankan import KankanIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .liveleak import LiveLeakIE
from .livestream import LivestreamIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mit import TechTVMITIE, MITIE
from .mixcloud import MixcloudIE
from .mtv import MTVIE
from .muzu import MuzuTVIE
from .myspass import MySpassIE
from .myvideo import MyVideoIE
from .naver import NaverIE
from .nba import NBAIE
from .nbc import NBCNewsIE
from .newgrounds import NewgroundsIE
from .nhl import NHLIE, NHLVideocenterIE
from .nowvideo import NowVideoIE
from .ooyala import OoyalaIE
from .orf import ORFIE
from .pbs import PBSIE
from .photobucket import PhotobucketIE
from .pornotube import PornotubeIE
from .rbmaradio import RBMARadioIE
from .redtube import RedTubeIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtlnow import RTLnowIE
from .sina import SinaIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE
from .slashdot import SlashdotIE
from .slideshare import SlideshareIE
from .sohu import SohuIE
from .soundcloud import SoundcloudIE, SoundcloudSetIE, SoundcloudUserIE
from .southparkstudios import SouthParkStudiosIE
from .spiegel import SpiegelIE
from .stanfordoc import StanfordOpenClassroomIE
from .statigram import StatigramIE
from .steam import SteamIE
from .sztvhu import SztvHuIE
from .teamcoco import TeamcocoIE
from .ted import TEDIE
from .tf1 import TF1IE
from .thisav import ThisAVIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .tudou import TudouIE
from .tumblr import TumblrIE
from .tutv import TutvIE
from .ustream import UstreamIE
from .unistra import UnistraIE
from .ustream import UstreamIE, UstreamChannelIE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vevo import VevoIE
from .vimeo import VimeoIE
from .vice import ViceIE
from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE
from .videopremium import VideoPremiumIE
from .vimeo import VimeoIE, VimeoChannelIE
from .vine import VineIE
from .c56 import C56IE
from .wat import WatIE
from .websurg import WeBSurgIE
from .weibo import WeiboIE
from .wimp import WimpIE
from .worldstarhiphop import WorldStarHipHopIE
@@ -93,6 +156,9 @@ from .youtube import (
YoutubeShowIE,
YoutubeSubscriptionsIE,
YoutubeRecommendedIE,
YoutubeTruncatedURLIE,
YoutubeWatchLaterIE,
YoutubeFavouritesIE,
)
from .zdf import ZDFIE
@@ -104,12 +170,14 @@ _ALL_CLASSES = [
]
_ALL_CLASSES.append(GenericIE)
def gen_extractors():
""" Return a list of an instance of every supported extractor.
The order does matter; the first extractor matched is the one handling the URL.
"""
return [klass() for klass in _ALL_CLASSES]
def get_info_extractor(ie_name):
"""Returns the info extractor class with the given ie_name"""
return globals()[ie_name+'IE']

View File

@@ -0,0 +1,75 @@
import re
from .common import InfoExtractor
from ..utils import (
compat_HTTPError,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
ExtractorError,
)
class AddAnimeIE(InfoExtractor):
_VALID_URL = r'^http://(?:\w+\.)?add-anime\.net/watch_video.php\?(?:.*?)v=(?P<video_id>[\w_]+)(?:.*)'
IE_NAME = u'AddAnime'
_TEST = {
u'url': u'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
u'file': u'24MR3YO5SAS9.flv',
u'md5': u'1036a0e0cd307b95bd8a8c3a5c8cfaf1',
u'info_dict': {
u"description": u"One Piece 606",
u"title": u"One Piece 606"
}
}
def _real_extract(self, url):
try:
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
except ExtractorError as ee:
if not isinstance(ee.cause, compat_HTTPError):
raise
redir_webpage = ee.cause.read().decode('utf-8')
action = self._search_regex(
r'<form id="challenge-form" action="([^"]+)"',
redir_webpage, u'Redirect form')
vc = self._search_regex(
r'<input type="hidden" name="jschl_vc" value="([^"]+)"/>',
redir_webpage, u'redirect vc value')
av = re.search(
r'a\.value = ([0-9]+)[+]([0-9]+)[*]([0-9]+);',
redir_webpage)
if av is None:
raise ExtractorError(u'Cannot find redirect math task')
av_res = int(av.group(1)) + int(av.group(2)) * int(av.group(3))
parsed_url = compat_urllib_parse_urlparse(url)
av_val = av_res + len(parsed_url.netloc)
confirm_url = (
parsed_url.scheme + u'://' + parsed_url.netloc +
action + '?' +
compat_urllib_parse.urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage(
confirm_url, video_id,
note=u'Confirming after redirect')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(r"var normal_video_file = '(.*?)';",
webpage, u'video file URL')
video_title = self._og_search_title(webpage)
video_description = self._og_search_description(webpage)
return {
'_type': 'video',
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
'description': video_description
}

View File

@@ -0,0 +1,138 @@
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
determine_ext,
)
class AppleTrailersIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?trailers.apple.com/trailers/(?P<company>[^/]+)/(?P<movie>[^/]+)'
_TEST = {
u"url": u"http://trailers.apple.com/trailers/wb/manofsteel/",
u"playlist": [
{
u"file": u"manofsteel-trailer4.mov",
u"md5": u"d97a8e575432dbcb81b7c3acb741f8a8",
u"info_dict": {
u"duration": 111,
u"title": u"Trailer 4",
u"upload_date": u"20130523",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-trailer3.mov",
u"md5": u"b8017b7131b721fb4e8d6f49e1df908c",
u"info_dict": {
u"duration": 182,
u"title": u"Trailer 3",
u"upload_date": u"20130417",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-trailer.mov",
u"md5": u"d0f1e1150989b9924679b441f3404d48",
u"info_dict": {
u"duration": 148,
u"title": u"Trailer",
u"upload_date": u"20121212",
u"uploader_id": u"wb",
},
},
{
u"file": u"manofsteel-teaser.mov",
u"md5": u"5fe08795b943eb2e757fa95cb6def1cb",
u"info_dict": {
u"duration": 93,
u"title": u"Teaser",
u"upload_date": u"20120721",
u"uploader_id": u"wb",
},
}
]
}
_JSON_RE = r'iTunes.playURL\((.*?)\);'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
movie = mobj.group('movie')
uploader_id = mobj.group('company')
playlist_url = compat_urlparse.urljoin(url, u'includes/playlists/itunes.inc')
playlist_snippet = self._download_webpage(playlist_url, movie)
playlist_cleaned = re.sub(r'(?s)<script[^<]*?>.*?</script>', u'', playlist_snippet)
playlist_cleaned = re.sub(r'<img ([^<]*?)>', r'<img \1/>', playlist_cleaned)
# The ' in the onClick attributes are not escaped, it couldn't be parsed
# with xml.etree.ElementTree.fromstring
# like: http://trailers.apple.com/trailers/wb/gravity/
def _clean_json(m):
return u'iTunes.playURL(%s);' % m.group(1).replace('\'', '&#39;')
playlist_cleaned = re.sub(self._JSON_RE, _clean_json, playlist_cleaned)
playlist_html = u'<html>' + playlist_cleaned + u'</html>'
doc = xml.etree.ElementTree.fromstring(playlist_html)
playlist = []
for li in doc.findall('./div/ul/li'):
on_click = li.find('.//a').attrib['onClick']
trailer_info_json = self._search_regex(self._JSON_RE,
on_click, u'trailer info')
trailer_info = json.loads(trailer_info_json)
title = trailer_info['title']
video_id = movie + '-' + re.sub(r'[^a-zA-Z0-9]', '', title).lower()
thumbnail = li.find('.//img').attrib['src']
upload_date = trailer_info['posted'].replace('-', '')
runtime = trailer_info['runtime']
m = re.search(r'(?P<minutes>[0-9]+):(?P<seconds>[0-9]{1,2})', runtime)
duration = None
if m:
duration = 60 * int(m.group('minutes')) + int(m.group('seconds'))
first_url = trailer_info['url']
trailer_id = first_url.split('/')[-1].rpartition('_')[0].lower()
settings_json_url = compat_urlparse.urljoin(url, 'includes/settings/%s.json' % trailer_id)
settings_json = self._download_webpage(settings_json_url, trailer_id, u'Downloading settings json')
settings = json.loads(settings_json)
formats = []
for format in settings['metadata']['sizes']:
# The src is a file pointing to the real video file
format_url = re.sub(r'_(\d*p.mov)', r'_h\1', format['src'])
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format': format['type'],
'width': format['width'],
'height': int(format['height']),
})
formats = sorted(formats, key=lambda f: (f['height'], f['width']))
info = {
'_type': 'video',
'id': video_id,
'title': title,
'formats': formats,
'title': title,
'duration': duration,
'thumbnail': thumbnail,
'upload_date': upload_date,
'uploader_id': uploader_id,
'user_agent': 'QuickTime compatible (youtube-dl)',
}
# TODO: Remove when #980 has been merged
info['url'] = formats[-1]['url']
info['ext'] = formats[-1]['ext']
playlist.append(info)
return {
'_type': 'playlist',
'id': movie,
'entries': playlist,
}

View File

@@ -46,6 +46,8 @@ class ArchiveOrgIE(InfoExtractor):
for fn,fdata in data['files'].items()
if 'Video' in fdata['format']]
formats.sort(key=lambda fdata: fdata['file_size'])
for f in formats:
f['ext'] = determine_ext(f['url'])
info = {
'_type': 'video',
@@ -61,7 +63,6 @@ class ArchiveOrgIE(InfoExtractor):
info['thumbnail'] = thumbnail
# TODO: Remove when #980 has been merged
info['url'] = formats[-1]['url']
info['ext'] = determine_ext(formats[-1]['url'])
info.update(formats[-1])
return info
return info

View File

@@ -1,3 +1,4 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
@@ -7,23 +8,24 @@ from ..utils import (
ExtractorError,
find_xpath_attr,
unified_strdate,
determine_ext,
get_element_by_id,
)
# There are different sources of video in arte.tv, the extraction process
# is different for each one. The videos usually expire in 7 days, so we can't
# add tests.
class ArteTvIE(InfoExtractor):
"""
There are two sources of video in arte.tv: videos.arte.tv and
www.arte.tv/guide, the extraction process is different for each one.
The videos expire in 7 days, so we can't add tests.
"""
_EMISSION_URL = r'(?:http://)?www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
_VIDEOS_URL = r'(?:http://)?videos.arte.tv/(?P<lang>fr|de)/.*-(?P<id>.*?).html'
_LIVEWEB_URL = r'(?:http://)?liveweb.arte.tv/(?P<lang>fr|de)/(?P<subpage>.+?)/(?P<name>.+)'
_LIVE_URL = r'index-[0-9]+\.html$'
IE_NAME = u'arte.tv'
@classmethod
def suitable(cls, url):
return any(re.match(regex, url) for regex in (cls._EMISSION_URL, cls._VIDEOS_URL))
return any(re.match(regex, url) for regex in (cls._VIDEOS_URL, cls._LIVEWEB_URL))
# TODO implement Live Stream
# from ..utils import compat_urllib_parse
@@ -54,66 +56,23 @@ class ArteTvIE(InfoExtractor):
# video_url = u'%s/%s' % (info.get('url'), info.get('path'))
def _real_extract(self, url):
mobj = re.match(self._EMISSION_URL, url)
if mobj is not None:
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return self._extract_emission(url, video_id, lang)
mobj = re.match(self._VIDEOS_URL, url)
if mobj is not None:
id = mobj.group('id')
lang = mobj.group('lang')
return self._extract_video(url, id, lang)
mobj = re.match(self._LIVEWEB_URL, url)
if mobj is not None:
name = mobj.group('name')
lang = mobj.group('lang')
return self._extract_liveweb(url, name, lang)
if re.search(self._LIVE_URL, video_id) is not None:
raise ExtractorError(u'Arte live streams are not yet supported, sorry')
# self.extractLiveStream(url)
# return
def _extract_emission(self, url, video_id, lang):
"""Extract from www.arte.tv/guide"""
webpage = self._download_webpage(url, video_id)
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info['VDE'],
'upload_date': unified_strdate(player_info['VDA'].split(' ')[0]),
'thumbnail': player_info['programImage'],
'ext': 'flv',
}
formats = player_info['VSR'].values()
def _match_lang(f):
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'V%s-ST.' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, formats)
# We order the formats by quality
formats = sorted(formats, key=lambda f: int(f['height']))
# Pick the best quality
format_info = formats[-1]
if format_info['mediaType'] == u'rtmp':
info_dict['url'] = format_info['streamer']
info_dict['play_path'] = 'mp4:' + format_info['url']
else:
info_dict['url'] = format_info['url']
return info_dict
def _extract_video(self, url, video_id, lang):
"""Extract from videos.arte.tv"""
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
@@ -144,3 +103,129 @@ class ArteTvIE(InfoExtractor):
'url': video_url,
'ext': 'flv',
}
def _extract_liveweb(self, url, name, lang):
"""Extract form http://liveweb.arte.tv/"""
webpage = self._download_webpage(url, name)
video_id = self._search_regex(r'eventId=(\d+?)("|&)', webpage, u'event id')
config_xml = self._download_webpage('http://download.liveweb.arte.tv/o21/liveweb/events/event-%s.xml' % video_id,
video_id, u'Downloading information')
config_doc = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
event_doc = config_doc.find('event')
url_node = event_doc.find('video').find('urlHd')
if url_node is None:
url_node = video_doc.find('urlSd')
return {'id': video_id,
'title': event_doc.find('name%s' % lang.capitalize()).text,
'url': url_node.text.replace('MP4', 'mp4'),
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage),
}
class ArteTVPlus7IE(InfoExtractor):
IE_NAME = u'arte.tv:+7'
_VALID_URL = r'https?://www\.arte.tv/guide/(?P<lang>fr|de)/(?:(?:sendungen|emissions)/)?(?P<id>.*?)/(?P<name>.*?)(\?.*)?'
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
json_url = self._html_search_regex(r'arte_vp_url="(.*?)"', webpage, 'json url')
json_info = self._download_webpage(json_url, video_id, 'Downloading info json')
self.report_extraction(video_id)
info = json.loads(json_info)
player_info = info['videoJsonPlayer']
info_dict = {
'id': player_info['VID'],
'title': player_info['VTI'],
'description': player_info.get('VDE'),
'upload_date': unified_strdate(player_info.get('VDA', '').split(' ')[0]),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
formats = player_info['VSR'].values()
def _match_lang(f):
if f.get('versionCode') is None:
return True
# Return true if that format is in the language of the url
if lang == 'fr':
l = 'F'
elif lang == 'de':
l = 'A'
regexes = [r'VO?%s' % l, r'VO?.-ST%s' % l]
return any(re.match(r, f['versionCode']) for r in regexes)
# Some formats may not be in the same language as the url
formats = filter(_match_lang, formats)
# Some formats use the m3u8 protocol
formats = filter(lambda f: f.get('videoFormat') != 'M3U8', formats)
# We order the formats by quality
formats = sorted(formats, key=lambda f: int(f.get('height',-1)))
# Prefer videos without subtitles in the same language
formats = sorted(formats, key=lambda f: re.match(r'VO(F|A)-STM\1', f.get('versionCode', '')) is None)
# Pick the best quality
def _format(format_info):
info = {
'width': format_info.get('width'),
'height': format_info.get('height'),
}
if format_info['mediaType'] == u'rtmp':
info['url'] = format_info['streamer']
info['play_path'] = 'mp4:' + format_info['url']
info['ext'] = 'flv'
else:
info['url'] = format_info['url']
info['ext'] = determine_ext(info['url'])
return info
info_dict['formats'] = [_format(f) for f in formats]
# TODO: Remove when #980 has been merged
info_dict.update(info_dict['formats'][-1])
return info_dict
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de)/magazine?/(?P<id>.+)'
_TEST = {
u'url': u'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
u'file': u'050489-002.mp4',
u'info_dict': {
u'title': u'Agentur Amateur #2 - Corporate Design',
},
}
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = u'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de)/(thema|sujet)/.*?#article-anchor-(?P<id>\d+)'
_TEST = {
u'url': u'http://future.arte.tv/fr/sujet/info-sciences#article-anchor-7081',
u'file': u'050940-003.mp4',
u'info_dict': {
u'title': u'Les champignons au secours de la planète',
},
}
def _real_extract(self, url):
anchor_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, anchor_id)
row = get_element_by_id(anchor_id, webpage)
return self._extract_from_webpage(row, anchor_id, lang)

View File

@@ -115,7 +115,7 @@ class BlipTVIE(InfoExtractor):
ext = umobj.group(1)
info = {
'id': data['item_id'],
'id': compat_str(data['item_id']),
'url': video_url,
'uploader': data['display_name'],
'upload_date': upload_date,

View File

@@ -0,0 +1,27 @@
import re
from .common import InfoExtractor
class BloombergIE(InfoExtractor):
_VALID_URL = r'https?://www\.bloomberg\.com/video/(?P<name>.+?).html'
_TEST = {
u'url': u'http://www.bloomberg.com/video/shah-s-presentation-on-foreign-exchange-strategies-qurhIVlJSB6hzkVi229d8g.html',
u'file': u'12bzhqZTqQHmmlA8I-i0NpzJgcG5NNYX.mp4',
u'info_dict': {
u'title': u'Shah\'s Presentation on Foreign-Exchange Strategies',
u'description': u'md5:abc86e5236f9f0e4866c59ad36736686',
},
u'params': {
# Requires ffmpeg (m3u8 manifest)
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
webpage = self._download_webpage(url, name)
ooyala_url = self._og_search_video_url(webpage)
return self.url_result(ooyala_url, ie='Ooyala')

View File

@@ -1,3 +1,5 @@
# encoding: utf-8
import re
import json
import xml.etree.ElementTree
@@ -7,15 +9,39 @@ from ..utils import (
compat_urllib_parse,
find_xpath_attr,
compat_urlparse,
ExtractorError,
)
class BrightcoveIE(InfoExtractor):
_VALID_URL = r'https?://.*brightcove\.com/(services|viewer).*\?(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_PLAYLIST_URL_TEMPLATE = 'http://c.brightcove.com/services/json/experience/runtime/?command=get_programming_for_experience&playerKey=%s'
# There is a test for Brigtcove in GenericIE, that way we test both the download
# and the detection of videos, and we don't have to find an URL that is always valid
_TESTS = [
{
# From http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1654948606001&flashID=myExperience&%40videoPlayer=2371591881001',
u'file': u'2371591881001.mp4',
u'md5': u'9e80619e0a94663f0bdc849b4566af19',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'8TV',
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd',
}
},
{
# From http://medianetwork.oracle.com/video/player/1785452137001
u'url': u'http://c.brightcove.com/services/viewer/htmlFederated?playerID=1217746023001&flashID=myPlayer&%40videoPlayer=1785452137001',
u'file': u'1785452137001.flv',
u'info_dict': {
u'title': u'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
u'description': u'John Rose speaks at the JVM Language Summit, August 1, 2012.',
u'uploader': u'Oracle',
},
},
]
@classmethod
def _build_brighcove_url(cls, object_str):
@@ -23,6 +49,11 @@ class BrightcoveIE(InfoExtractor):
Build a Brightcove url from a xml string containing
<object class="BrightcoveExperience">{params}</object>
"""
# Fix up some stupid HTML, see https://github.com/rg3/youtube-dl/issues/1553
object_str = re.sub(r'(<param name="[^"]+" value="[^"]+")>',
lambda m: m.group(1) + '/>', object_str)
object_doc = xml.etree.ElementTree.fromstring(object_str)
assert u'BrightcoveExperience' in object_doc.attrib['class']
params = {'flashID': object_doc.attrib['id'],
@@ -72,15 +103,27 @@ class BrightcoveIE(InfoExtractor):
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
renditions = video_info['renditions']
renditions = sorted(renditions, key=lambda r: r['size'])
best_format = renditions[-1]
info = {
'id': video_info['id'],
'title': video_info['displayName'],
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
}
return {'id': video_info['id'],
'title': video_info['displayName'],
'url': best_format['defaultURL'],
renditions = video_info.get('renditions')
if renditions:
renditions = sorted(renditions, key=lambda r: r['size'])
best_format = renditions[-1]
info.update({
'url': best_format['defaultURL'],
'ext': 'mp4',
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
}
})
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'ext': 'flv',
})
else:
raise ExtractorError(u'Unable to extract video url for %s' % info['id'])
return info

View File

@@ -12,8 +12,8 @@ class C56IE(InfoExtractor):
_TEST ={
u'url': u'http://www.56.com/u39/v_OTM0NDA3MTY.html',
u'file': u'93440716.mp4',
u'md5': u'9dc07b5c8e978112a6441f9e75d2b59e',
u'file': u'93440716.flv',
u'md5': u'e59995ac63d0457783ea05f93f12a866',
u'info_dict': {
u'title': u'网事知多少 第32期车怒',
},

View File

@@ -0,0 +1,35 @@
# coding: utf-8
import re
from .common import InfoExtractor
class Canalc2IE(InfoExtractor):
IE_NAME = 'canalc2.tv'
_VALID_URL = r'http://.*?\.canalc2\.tv/video\.asp\?idVideo=(\d+)&voir=oui'
_TEST = {
u'url': u'http://www.canalc2.tv/video.asp?idVideo=12163&voir=oui',
u'file': u'12163.mp4',
u'md5': u'060158428b650f896c542dfbb3d6487f',
u'info_dict': {
u'title': u'Terrasses du Numérique'
}
}
def _real_extract(self, url):
video_id = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, video_id)
file_name = self._search_regex(
r"so\.addVariable\('file','(.*?)'\);",
webpage, 'file name')
video_url = 'http://vod-flash.u-strasbg.fr:8080/' + file_name
title = self._html_search_regex(
r'class="evenement8">(.*?)</a>', webpage, u'title')
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,
}

View File

@@ -1,3 +1,4 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
@@ -5,24 +6,29 @@ from .common import InfoExtractor
from ..utils import unified_strdate
class CanalplusIE(InfoExtractor):
_VALID_URL = r'https?://www\.canalplus\.fr/.*?\?vid=(?P<id>\d+)'
_VALID_URL = r'https?://(www\.canalplus\.fr/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>\d+))'
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/cplus/%s'
IE_NAME = u'canalplus.fr'
_TEST = {
u'url': u'http://www.canalplus.fr/c-divertissement/pid3351-c-le-petit-journal.html?vid=889861',
u'file': u'889861.flv',
u'md5': u'590a888158b5f0d6832f84001fbf3e99',
u'url': u'http://www.canalplus.fr/c-infos-documentaires/pid1830-c-zapping.html?vid=922470',
u'file': u'922470.flv',
u'info_dict': {
u'title': u'Le Petit Journal 20/06/13 - La guerre des drone',
u'upload_date': u'20130620',
u'title': u'Zapping - 26/08/13',
u'description': u'Le meilleur de toutes les chaînes, tous les jours.\nEmission du 26 août 2013',
u'upload_date': u'20130826',
},
u'params': {
u'skip_download': True,
},
u'skip': u'Requires rtmpdump'
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
if video_id is None:
webpage = self._download_webpage(url, mobj.group('path'))
video_id = self._search_regex(r'videoId = "(\d+)";', webpage, u'video id')
info_url = self._VIDEO_INFO_TEMPLATE % video_id
info_page = self._download_webpage(info_url,video_id,
u'Downloading video info')
@@ -43,4 +49,6 @@ class CanalplusIE(InfoExtractor):
'ext': 'flv',
'upload_date': unified_strdate(infos.find('PUBLICATION/DATE').text),
'thumbnail': media.find('IMAGES/GRAND').text,
'description': infos.find('DESCRIPTION').text,
'view_count': int(infos.find('NB_VUES').text),
}

View File

@@ -0,0 +1,91 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class CinemassacreIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?(?P<url>cinemassacre\.com/(?P<date_Y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/.+?)(?:[/?].*)?'
_TESTS = [{
u'url': u'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
u'file': u'19911.flv',
u'info_dict': {
u'upload_date': u'20121110',
u'title': u'“Angry Video Game Nerd: The Movie” Trailer',
u'description': u'md5:fb87405fcb42a331742a0dce2708560b',
},
u'params': {
# rtmp download
u'skip_download': True,
},
},
{
u'url': u'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
u'file': u'521be8ef82b16.flv',
u'info_dict': {
u'upload_date': u'20131002',
u'title': u'The Mummys Hand (1940)',
},
u'params': {
# rtmp download
u'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
webpage = self._download_webpage(webpage_url, None) # Don't know video id yet
video_date = mobj.group('date_Y') + mobj.group('date_m') + mobj.group('date_d')
mobj = re.search(r'src="(?P<embed_url>http://player\.screenwavemedia\.com/play/(?:embed|player)\.php\?id=(?:Cinemassacre-)?(?P<video_id>.+?))"', webpage)
if not mobj:
raise ExtractorError(u'Can\'t extract embed url and video id')
playerdata_url = mobj.group(u'embed_url')
video_id = mobj.group(u'video_id')
video_title = self._html_search_regex(r'<title>(?P<title>.+?)\|',
webpage, u'title')
video_description = self._html_search_regex(r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, u'description', flags=re.DOTALL, fatal=False)
if len(video_description) == 0:
video_description = None
playerdata = self._download_webpage(playerdata_url, video_id)
base_url = self._html_search_regex(r'\'streamer\': \'(?P<base_url>rtmp://.*?)/(?:vod|Cinemassacre)\'',
playerdata, u'base_url')
base_url += '/Cinemassacre/'
# Important: The file names in playerdata are not used by the player and even wrong for some videos
sd_file = 'Cinemassacre-%s_high.mp4' % video_id
hd_file = 'Cinemassacre-%s.mp4' % video_id
video_thumbnail = 'http://image.screenwavemedia.com/Cinemassacre/Cinemassacre-%s_thumb_640x360.jpg' % video_id
formats = [
{
'url': base_url + sd_file,
'ext': 'flv',
'format': 'sd',
'format_id': 'sd',
},
{
'url': base_url + hd_file,
'ext': 'flv',
'format': 'hd',
'format_id': 'hd',
},
]
info = {
'id': video_id,
'title': video_title,
'formats': formats,
'description': video_description,
'upload_date': video_date,
'thumbnail': video_thumbnail,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,58 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import determine_ext
class CNNIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(edition\.)?cnn\.com/video/(data/.+?|\?)/
(?P<path>.+?/(?P<title>[^/]+?)(?:\.cnn|(?=&)))'''
_TESTS = [{
u'url': u'http://edition.cnn.com/video/?/video/sports/2013/06/09/nadal-1-on-1.cnn',
u'file': u'sports_2013_06_09_nadal-1-on-1.cnn.mp4',
u'md5': u'3e6121ea48df7e2259fe73a0628605c4',
u'info_dict': {
u'title': u'Nadal wins 8th French Open title',
u'description': u'World Sport\'s Amanda Davies chats with 2013 French Open champion Rafael Nadal.',
},
},
{
u"url": u"http://edition.cnn.com/video/?/video/us/2013/08/21/sot-student-gives-epic-speech.georgia-institute-of-technology&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+rss%2Fcnn_topstories+%28RSS%3A+Top+Stories%29",
u"file": u"us_2013_08_21_sot-student-gives-epic-speech.georgia-institute-of-technology.mp4",
u"md5": u"b5cc60c60a3477d185af8f19a2a26f4e",
u"info_dict": {
u"title": "Student's epic speech stuns new freshmen",
u"description": "A Georgia Tech student welcomes the incoming freshmen with an epic speech backed by music from \"2001: A Space Odyssey.\""
}
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
path = mobj.group('path')
page_title = mobj.group('title')
info_url = u'http://cnn.com/video/data/3.0/%s/index.xml' % path
info_xml = self._download_webpage(info_url, page_title)
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
formats = []
for f in info.findall('files/file'):
mf = re.match(r'(\d+)x(\d+)(?:_(.*)k)?',f.attrib['bitrate'])
if mf is not None:
formats.append((int(mf.group(1)), int(mf.group(2)), int(mf.group(3) or 0), f.text))
formats = sorted(formats)
(_,_,_, video_path) = formats[-1]
video_url = 'http://ht.cdn.turner.com/cnn/big%s' % video_path
thumbnails = sorted([((int(t.attrib['height']),int(t.attrib['width'])), t.text) for t in info.findall('images/image')])
thumbs_dict = [{'resolution': res, 'url': t_url} for (res, t_url) in thumbnails]
return {'id': info.attrib['id'],
'title': info.find('headline').text,
'url': video_url,
'ext': determine_ext(video_url),
'thumbnail': thumbnails[-1][1],
'thumbnails': thumbs_dict,
'description': info.find('description').text,
}

View File

@@ -4,15 +4,16 @@ import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse_urlparse,
determine_ext,
ExtractorError,
)
class CollegeHumorIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?collegehumor\.com/(video|embed)/(?P<videoid>[0-9]+)/(?P<shorttitle>.*)$'
_VALID_URL = r'^(?:https?://)?(?:www\.)?collegehumor\.com/(video|embed|e)/(?P<videoid>[0-9]+)/?(?P<shorttitle>.*)$'
_TEST = {
_TESTS = [{
u'url': u'http://www.collegehumor.com/video/6902724/comic-con-cosplay-catastrophe',
u'file': u'6902724.mp4',
u'md5': u'1264c12ad95dca142a9f0bf7968105a0',
@@ -20,7 +21,16 @@ class CollegeHumorIE(InfoExtractor):
u'title': u'Comic-Con Cosplay Catastrophe',
u'description': u'Fans get creative this year at San Diego. Too creative. And yes, that\'s really Joss Whedon.',
},
}
},
{
u'url': u'http://www.collegehumor.com/video/3505939/font-conference',
u'file': u'3505939.mp4',
u'md5': u'c51ca16b82bb456a4397987791a835f5',
u'info_dict': {
u'title': u'Font Conference',
u'description': u'This video wasn\'t long enough, so we made it double-spaced.',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -49,25 +59,29 @@ class CollegeHumorIE(InfoExtractor):
info['description'] = videoNode.findall('./description')[0].text
info['title'] = videoNode.findall('./caption')[0].text
info['thumbnail'] = videoNode.findall('./thumbnail')[0].text
manifest_url = videoNode.findall('./file')[0].text
next_url = videoNode.findall('./file')[0].text
except IndexError:
raise ExtractorError(u'Invalid metadata XML file')
manifest_url += '?hdcore=2.10.3'
manifestXml = self._download_webpage(manifest_url, video_id,
u'Downloading XML manifest',
u'Unable to download video info XML')
if next_url.endswith(u'manifest.f4m'):
manifest_url = next_url + '?hdcore=2.10.3'
manifestXml = self._download_webpage(manifest_url, video_id,
u'Downloading XML manifest',
u'Unable to download video info XML')
adoc = xml.etree.ElementTree.fromstring(manifestXml)
try:
media_node = adoc.findall('./{http://ns.adobe.com/f4m/1.0}media')[0]
node_id = media_node.attrib['url']
video_id = adoc.findall('./{http://ns.adobe.com/f4m/1.0}id')[0].text
except IndexError as err:
raise ExtractorError(u'Invalid manifest file')
adoc = xml.etree.ElementTree.fromstring(manifestXml)
try:
media_node = adoc.findall('./{http://ns.adobe.com/f4m/1.0}media')[0]
node_id = media_node.attrib['url']
video_id = adoc.findall('./{http://ns.adobe.com/f4m/1.0}id')[0].text
except IndexError as err:
raise ExtractorError(u'Invalid manifest file')
url_pr = compat_urllib_parse_urlparse(info['thumbnail'])
info['url'] = url_pr.scheme + '://' + url_pr.netloc + video_id[:-2].replace('.csmil','').replace(',','')
info['ext'] = 'mp4'
else:
# Old-style direct links
info['url'] = next_url
info['ext'] = determine_ext(info['url'])
url_pr = compat_urllib_parse_urlparse(info['thumbnail'])
info['url'] = url_pr.scheme + '://' + url_pr.netloc + video_id[:-2].replace('.csmil','').replace(',','')
info['ext'] = 'mp4'
return [info]
return info

View File

@@ -51,12 +51,12 @@ class ComedyCentralIE(InfoExtractor):
'400': 'mp4',
}
_video_dimensions = {
'3500': '1280x720',
'2200': '960x540',
'1700': '768x432',
'1200': '640x360',
'750': '512x288',
'400': '384x216',
'3500': (1280, 720),
'2200': (960, 540),
'1700': (768, 432),
'1200': (640, 360),
'750': (512, 288),
'400': (384, 216),
}
@classmethod
@@ -64,11 +64,13 @@ class ComedyCentralIE(InfoExtractor):
"""Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url, re.VERBOSE) is not None
def _print_formats(self, formats):
print('Available formats:')
for x in formats:
print('%s\t:\t%s\t[%s]' %(x, self._video_extensions.get(x, 'mp4'), self._video_dimensions.get(x, '???')))
@staticmethod
def _transform_rtmp_url(rtmp_video_url):
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
return base + m.group('finalid')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, re.VERBOSE)
@@ -155,40 +157,31 @@ class ComedyCentralIE(InfoExtractor):
self._downloader.report_error(u'unable to download ' + mediaId + ': No videos found')
continue
if self._downloader.params.get('listformats', None):
self._print_formats([i[0] for i in turls])
return
# For now, just pick the highest bitrate
format,rtmp_video_url = turls[-1]
# Get the format arg from the arg stream
req_format = self._downloader.params.get('format', None)
# Select format if we can find one
for f,v in turls:
if f == req_format:
format, rtmp_video_url = f, v
break
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp.comedystor/.*)$', rtmp_video_url)
if not m:
raise ExtractorError(u'Cannot transform RTMP url')
base = 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=1+_pxI0=Ripod-h264+_pxL0=undefined+_pxM0=+_pxK=18639+_pxE=mp4/44620/mtvnorigin/'
video_url = base + m.group('finalid')
formats = []
for format, rtmp_video_url in turls:
w, h = self._video_dimensions.get(format, (None, None))
formats.append({
'url': self._transform_rtmp_url(rtmp_video_url),
'ext': self._video_extensions.get(format, 'mp4'),
'format_id': format,
'height': h,
'width': w,
})
effTitle = showId + u'-' + epTitle + u' part ' + compat_str(partNum+1)
info = {
'id': shortMediaId,
'url': video_url,
'formats': formats,
'uploader': showId,
'upload_date': officialDate,
'title': effTitle,
'ext': 'mp4',
'format': format,
'thumbnail': None,
'description': compat_str(officialTitle),
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
results.append(info)
return results

View File

@@ -35,6 +35,8 @@ class InfoExtractor(object):
title: Video title, unescaped.
ext: Video filename extension.
Instead of url and ext, formats can also specified.
The following fields are optional:
format: The video format, defaults to ext (used for --get-format)
@@ -47,12 +49,25 @@ class InfoExtractor(object):
uploader_id: Nickname or id of the video uploader.
location: Physical location of the video.
player_url: SWF Player URL (used for rtmpdump).
subtitles: The subtitle file contents.
subtitles: The subtitle file contents as a dictionary in the format
{language: subtitles}.
view_count: How many users have watched the video on the platform.
urlhandle: [internal] The urlHandle to be used to download the file,
like returned by urllib.request.urlopen
age_limit: Age restriction for the video, as an integer (years)
formats: A list of dictionaries for each format available, it must
be ordered from worst to best quality. Potential fields:
* url Mandatory. The URL of the video file
* ext Will be calculated from url if missing
* format A human-readable description of the format
("mp4 container with h264/opus").
Calculated from width and height if missing.
* format_id A short description of the format
("mp4_h264_opus" or "19")
* width Width of the video, if known
* height Height of the video, if known
The fields should all be Unicode strings.
Unless mentioned otherwise, the fields should be Unicode strings.
Subclasses of this one should re-define the _real_initialize() and
_real_extract() methods and define a _VALID_URL regexp.
@@ -77,7 +92,13 @@ class InfoExtractor(object):
@classmethod
def suitable(cls, url):
"""Receives a URL and returns True if suitable for this IE."""
return re.match(cls._VALID_URL, url) is not None
# This does not use has/getattr intentionally - we want to know whether
# we have cached the regexp for *this* class, whereas getattr would also
# match the superclass
if '_VALID_URL_RE' not in cls.__dict__:
cls._VALID_URL_RE = re.compile(cls._VALID_URL)
return cls._VALID_URL_RE.match(url) is not None
@classmethod
def working(cls):
@@ -107,6 +128,11 @@ class InfoExtractor(object):
"""Real extraction process. Redefine in subclasses."""
pass
@classmethod
def ie_key(cls):
"""A string for getting the InfoExtractor with get_info_extractor"""
return cls.__name__[:-2]
@property
def IE_NAME(self):
return type(self).__name__[:-2]
@@ -122,7 +148,7 @@ class InfoExtractor(object):
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
if errnote is None:
errnote = u'Unable to download webpage'
raise ExtractorError(u'%s: %s' % (errnote, compat_str(err)), sys.exc_info()[2])
raise ExtractorError(u'%s: %s' % (errnote, compat_str(err)), sys.exc_info()[2], cause=err)
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None):
""" Returns a tuple (page content as string, URL handle) """
@@ -133,12 +159,17 @@ class InfoExtractor(object):
urlh = self._request_webpage(url_or_request, video_id, note, errnote)
content_type = urlh.headers.get('Content-Type', '')
webpage_bytes = urlh.read()
m = re.match(r'[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+\s*;\s*charset=(.+)', content_type)
if m:
encoding = m.group(1)
else:
encoding = 'utf-8'
webpage_bytes = urlh.read()
m = re.search(br'<meta[^>]+charset=[\'"]?([^\'")]+)[ /\'">]',
webpage_bytes[:1024])
if m:
encoding = m.group(1).decode('ascii')
else:
encoding = 'utf-8'
if self._downloader.params.get('dump_intermediate_pages', False):
try:
url = url_or_request.get_full_url()
@@ -288,6 +319,15 @@ class InfoExtractor(object):
self._og_regex('video')],
html, name, **kargs)
def _rta_search(self, html):
# See http://www.rtalabel.org/index.php?content=howtofaq#single
if re.search(r'(?ix)<meta\s+name="rating"\s+'
r' content="RTA-5042-1996-1400-1577-RTA"',
html):
return 18
return 0
class SearchInfoExtractor(InfoExtractor):
"""
Base class for paged search queries extractors.

View File

@@ -1,27 +1,58 @@
import re
import json
import itertools
from .common import InfoExtractor
from .subtitles import SubtitlesInfoExtractor
from ..utils import (
compat_urllib_request,
compat_str,
get_element_by_attribute,
get_element_by_id,
orderedSet,
ExtractorError,
)
class DailymotionIE(InfoExtractor):
class DailymotionBaseInfoExtractor(InfoExtractor):
@staticmethod
def _build_request(url):
"""Build a request with the family filter disabled"""
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
return request
class DailymotionIE(DailymotionBaseInfoExtractor, SubtitlesInfoExtractor):
"""Information Extractor for Dailymotion"""
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/video/([^/]+)'
_VALID_URL = r'(?i)(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/(?:embed/)?video/([^/]+)'
IE_NAME = u'dailymotion'
_TEST = {
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
u'file': u'x33vw9.mp4',
u'md5': u'392c4b85a60a90dc4792da41ce3144eb',
u'info_dict': {
u"uploader": u"Alex and Van .",
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
}
_TESTS = [
{
u'url': u'http://www.dailymotion.com/video/x33vw9_tutoriel-de-youtubeur-dl-des-video_tech',
u'file': u'x33vw9.mp4',
u'md5': u'392c4b85a60a90dc4792da41ce3144eb',
u'info_dict': {
u"uploader": u"Amphora Alex and Van .",
u"title": u"Tutoriel de Youtubeur\"DL DES VIDEO DE YOUTUBE\""
}
},
# Vevo video
{
u'url': u'http://www.dailymotion.com/video/x149uew_katy-perry-roar-official_musi',
u'file': u'USUV71301934.mp4',
u'info_dict': {
u'title': u'Roar (Official)',
u'uploader': u'Katy Perry',
u'upload_date': u'20130905',
},
u'params': {
u'skip_download': True,
},
u'skip': u'VEVO is only available in some countries',
},
]
def _real_extract(self, url):
# Extract id and simplified title from URL
@@ -30,15 +61,24 @@ class DailymotionIE(InfoExtractor):
video_id = mobj.group(1).split('_')[0].split('?')[0]
video_extension = 'mp4'
url = 'http://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information
request = compat_urllib_request.Request(url)
request.add_header('Cookie', 'family_filter=off')
request = self._build_request(url)
webpage = self._download_webpage(request, video_id)
# Extract URL, uploader and title from webpage
self.report_extraction(video_id)
# It may just embed a vevo video:
m_vevo = re.search(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?videoId=(?P<id>[\w]*)',
webpage)
if m_vevo is not None:
vevo_id = m_vevo.group('id')
self.to_screen(u'Vevo video detected: %s' % vevo_id)
return self.url_result(u'vevo:%s' % vevo_id, ie='Vevo')
video_uploader = self._search_regex([r'(?im)<span class="owner[^\"]+?">[^<]+?<a [^>]+?>([^<]+?)</a>',
# Looking for official user
r'<(?:span|a) .*?rel="author".*?>([^<]+?)</'],
@@ -52,8 +92,12 @@ class DailymotionIE(InfoExtractor):
embed_url = 'http://www.dailymotion.com/embed/video/%s' % video_id
embed_page = self._download_webpage(embed_url, video_id,
u'Downloading embed page')
info = self._search_regex(r'var info = ({.*?}),', embed_page, 'video info')
info = self._search_regex(r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE)
info = json.loads(info)
if info.get('error') is not None:
msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
raise ExtractorError(msg, expected=True)
# TODO: support choosing qualities
@@ -68,6 +112,12 @@ class DailymotionIE(InfoExtractor):
raise ExtractorError(u'Unable to extract video URL')
video_url = info[max_quality]
# subtitles
video_subtitles = self.extract_subtitles(video_id)
if self._downloader.params.get('listsubtitles', False):
self._list_available_subtitles(video_id)
return
return [{
'id': video_id,
'url': video_url,
@@ -75,5 +125,76 @@ class DailymotionIE(InfoExtractor):
'upload_date': video_upload_date,
'title': self._og_search_title(webpage),
'ext': video_extension,
'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url']
}]
def _get_available_subtitles(self, video_id):
try:
sub_list = self._download_webpage(
'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
video_id, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles: %s' % compat_str(err))
return {}
info = json.loads(sub_list)
if (info['total'] > 0):
sub_lang_list = dict((l['language'], l['url']) for l in info['list'])
return sub_lang_list
self._downloader.report_warning(u'video doesn\'t have subtitles')
return {}
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = u'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>.+?)/'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/playlist/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
def _extract_entries(self, id):
video_ids = []
for pagenum in itertools.count(1):
request = self._build_request(self._PAGE_TEMPLATE % (id, pagenum))
webpage = self._download_webpage(request,
id, u'Downloading page %s' % pagenum)
playlist_el = get_element_by_attribute(u'class', u'video_list', webpage)
video_ids.extend(re.findall(r'data-id="(.+?)"', playlist_el))
if re.search(self._MORE_PAGES_INDICATOR, webpage, re.DOTALL) is None:
break
return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in orderedSet(video_ids)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
return {'_type': 'playlist',
'id': playlist_id,
'title': get_element_by_id(u'playlist_name', webpage),
'entries': self._extract_entries(playlist_id),
}
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = u'dailymotion:user'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/user/(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'<div class="next">.*?<a.*?href="/user/.+?".*?>.*?</a>.*?</div>'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user = mobj.group('user')
webpage = self._download_webpage(url, user)
full_user = self._html_search_regex(
r'<a class="label" href="/%s".*?>(.*?)</' % re.escape(user),
webpage, u'user', flags=re.DOTALL)
return {
'_type': 'playlist',
'id': user,
'title': full_user,
'entries': self._extract_entries(user),
}

View File

@@ -0,0 +1,74 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
determine_ext,
)
class DaumIE(InfoExtractor):
_VALID_URL = r'https?://tvpot\.daum\.net/.*?clipid=(?P<id>\d+)'
IE_NAME = u'daum.net'
_TEST = {
u'url': u'http://tvpot.daum.net/clip/ClipView.do?clipid=52554690',
u'file': u'52554690.mp4',
u'info_dict': {
u'title': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'description': u'DOTA 2GETHER 시즌2 6회 - 2부',
u'upload_date': u'20130831',
u'duration': 3868,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
canonical_url = 'http://tvpot.daum.net/v/%s' % video_id
webpage = self._download_webpage(canonical_url, video_id)
full_id = self._search_regex(r'<link rel="video_src" href=".+?vid=(.+?)"',
webpage, u'full id')
query = compat_urllib_parse.urlencode({'vid': full_id})
info_xml = self._download_webpage(
'http://tvpot.daum.net/clip/ClipInfoXml.do?' + query, video_id,
u'Downloading video info')
urls_xml = self._download_webpage(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieData.apixml?' + query,
video_id, u'Downloading video formats info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
urls = xml.etree.ElementTree.fromstring(urls_xml.encode('utf-8'))
self.to_screen(u'%s: Getting video urls' % video_id)
formats = []
for format_el in urls.findall('result/output_list/output_list'):
profile = format_el.attrib['profile']
format_query = compat_urllib_parse.urlencode({
'vid': full_id,
'profile': profile,
})
url_xml = self._download_webpage(
'http://videofarm.daum.net/controller/api/open/v1_2/MovieLocation.apixml?' + format_query,
video_id, note=False)
url_doc = xml.etree.ElementTree.fromstring(url_xml.encode('utf-8'))
format_url = url_doc.find('result/url').text
formats.append({
'url': format_url,
'ext': determine_ext(format_url),
'format_id': profile,
})
info = {
'id': video_id,
'title': info.find('TITLE').text,
'formats': formats,
'thumbnail': self._og_search_thumbnail(webpage),
'description': info.find('CONTENTS').text,
'duration': int(info.find('DURATION').text),
'upload_date': info.find('REGDTTM').text[:8],
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,39 @@
import re
import json
from .common import InfoExtractor
class DefenseGouvFrIE(InfoExtractor):
_IE_NAME = 'defense.gouv.fr'
_VALID_URL = (r'http://.*?\.defense\.gouv\.fr/layout/set/'
r'ligthboxvideo/base-de-medias/webtv/(.*)')
_TEST = {
u'url': (u'http://www.defense.gouv.fr/layout/set/ligthboxvideo/'
u'base-de-medias/webtv/attaque-chimique-syrienne-du-21-aout-2013-1'),
u'file': u'11213.mp4',
u'md5': u'75bba6124da7e63d2d60b5244ec9430c',
"info_dict": {
"title": "attaque-chimique-syrienne-du-21-aout-2013-1"
}
}
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
video_id = self._search_regex(
r"flashvars.pvg_id=\"(\d+)\";",
webpage, 'ID')
json_url = ('http://static.videos.gouv.fr/brightcovehub/export/json/'
+ video_id)
info = self._download_webpage(json_url, title,
'Downloading JSON config')
video_url = json.loads(info)['renditions'][0]['url']
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,
}

View File

@@ -54,6 +54,7 @@ class DreiSatIE(InfoExtractor):
'width': int(fe.find('./width').text),
'height': int(fe.find('./height').text),
'url': fe.find('./url').text,
'ext': determine_ext(fe.find('./url').text),
'filesize': int(fe.find('./filesize').text),
'video_bitrate': int(fe.find('./videoBitrate').text),
'3sat_qualityname': fe.find('./quality').text,
@@ -79,7 +80,6 @@ class DreiSatIE(InfoExtractor):
}
# TODO: Remove when #980 has been merged
info['url'] = formats[-1]['url']
info['ext'] = determine_ext(formats[-1]['url'])
info.update(formats[-1])
return info
return info

View File

@@ -0,0 +1,37 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import determine_ext
class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.ebaumsworld.com/video/watch/83367677/',
u'file': u'83367677.mp4',
u'info_dict': {
u'title': u'A Giant Python Opens The Door',
u'description': u'This is how nightmares start...',
u'uploader': u'jihadpizza',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
config_xml = self._download_webpage(
'http://www.ebaumsworld.com/video/player/%s' % video_id, video_id)
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
video_url = config.find('file').text
return {
'id': video_id,
'title': config.find('title').text,
'url': video_url,
'ext': determine_ext(video_url),
'description': config.find('description').text,
'thumbnail': config.find('image').text,
'uploader': config.find('username').text,
}

View File

@@ -8,17 +8,30 @@ class ExfmIE(InfoExtractor):
IE_NAME = u'exfm'
IE_DESC = u'ex.fm'
_VALID_URL = r'(?:http://)?(?:www\.)?ex\.fm/song/([^/]+)'
_SOUNDCLOUD_URL_ = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream'
_TEST = {
u'url': u'http://ex.fm/song/1bgtzg',
u'file': u'1bgtzg.mp3',
u'md5': u'8a7967a3fef10e59a1d6f86240fd41cf',
u'info_dict': {
u"title": u"We Can't Stop",
u"uploader": u"Miley Cyrus",
u'thumbnail': u'http://i1.sndcdn.com/artworks-000049666230-w9i7ef-t500x500.jpg?9d68d37'
}
}
_SOUNDCLOUD_URL = r'(?:http://)?(?:www\.)?api\.soundcloud.com/tracks/([^/]+)/stream'
_TESTS = [
{
u'url': u'http://ex.fm/song/1bgtzg',
u'file': u'95223130.mp3',
u'md5': u'8a7967a3fef10e59a1d6f86240fd41cf',
u'info_dict': {
u"title": u"We Can't Stop - Miley Cyrus",
u"uploader": u"Miley Cyrus",
u'upload_date': u'20130603',
u'description': u'Download "We Can\'t Stop" \r\niTunes: http://smarturl.it/WeCantStop?IQid=SC\r\nAmazon: http://smarturl.it/WeCantStopAMZ?IQid=SC',
},
u'note': u'Soundcloud song',
},
{
u'url': u'http://ex.fm/song/wddt8',
u'file': u'wddt8.mp3',
u'md5': u'966bd70741ac5b8570d8e45bfaed3643',
u'info_dict': {
u'title': u'Safe and Sound',
u'uploader': u'Capital Cities',
},
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -26,11 +39,10 @@ class ExfmIE(InfoExtractor):
info_url = "http://ex.fm/api/v3/song/%s" %(song_id)
webpage = self._download_webpage(info_url, song_id)
info = json.loads(webpage)
song_url = re.match(self._SOUNDCLOUD_URL_,info['song']['url'])
if song_url is not None:
song_url = song_url.group() + "?client_id=b45b1aa10f1ac2941910a7f0d10f8e28"
else:
song_url = info['song']['url']
song_url = info['song']['url']
if re.match(self._SOUNDCLOUD_URL, song_url) is not None:
self.to_screen('Soundcloud song detected')
return self.url_result(song_url.replace('/stream',''), 'Soundcloud')
return [{
'id': song_id,
'url': song_url,

View File

@@ -106,8 +106,8 @@ class FacebookIE(InfoExtractor):
video_duration = int(video_data['video_duration'])
thumbnail = video_data['thumbnail_src']
video_title = self._html_search_regex('<h2 class="uiHeaderTitle">([^<]+)</h2>',
webpage, u'title')
video_title = self._html_search_regex(
r'<h2 class="uiHeaderTitle">([^<]*)</h2>', webpage, u'title')
info = {
'id': video_id,

View File

@@ -0,0 +1,60 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
determine_ext,
clean_html,
get_element_by_attribute,
)
class FazIE(InfoExtractor):
IE_NAME = u'faz.net'
_VALID_URL = r'https?://www\.faz\.net/multimedia/videos/.*?-(?P<id>\d+).html'
_TEST = {
u'url': u'http://www.faz.net/multimedia/videos/stockholm-chemie-nobelpreis-fuer-drei-amerikanische-forscher-12610585.html',
u'file': u'12610585.mp4',
u'info_dict': {
u'title': u'Stockholm: Chemie-Nobelpreis für drei amerikanische Forscher',
u'description': u'md5:1453fbf9a0d041d985a47306192ea253',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
self.to_screen(video_id)
webpage = self._download_webpage(url, video_id)
config_xml_url = self._search_regex(r'writeFLV\(\'(.+?)\',', webpage,
u'config xml url')
config_xml = self._download_webpage(config_xml_url, video_id,
u'Downloading config xml')
config = xml.etree.ElementTree.fromstring(config_xml.encode('utf-8'))
encodings = config.find('ENCODINGS')
formats = []
for code in ['LOW', 'HIGH', 'HQ']:
encoding = encodings.find(code)
if encoding is None:
continue
encoding_url = encoding.find('FILENAME').text
formats.append({
'url': encoding_url,
'ext': determine_ext(encoding_url),
'format_id': code.lower(),
})
descr_html = get_element_by_attribute('class', 'Content Copy', webpage)
info = {
'id': video_id,
'title': self._og_search_title(webpage),
'formats': formats,
'description': clean_html(descr_html),
'thumbnail': config.find('STILL/STILL_BIG').text,
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,79 @@
import re
import random
import json
from .common import InfoExtractor
from ..utils import (
determine_ext,
get_element_by_id,
clean_html,
)
class FKTVIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/folge-(?P<ep>[0-9]+)(?:/.*)?'
_TEST = {
u'url': u'http://fernsehkritik.tv/folge-1',
u'file': u'00011.flv',
u'info_dict': {
u'title': u'Folge 1 vom 10. April 2007',
u'description': u'md5:fb4818139c7cfe6907d4b83412a6864f',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_thumbnail = 'http://fernsehkritik.tv/images/magazin/folge%d.jpg' % episode
start_webpage = self._download_webpage('http://fernsehkritik.tv/folge-%d/Start' % episode,
episode)
playlist = self._search_regex(r'playlist = (\[.*?\]);', start_webpage,
u'playlist', flags=re.DOTALL)
files = json.loads(re.sub('{[^{}]*?}', '{}', playlist))
# TODO: return a single multipart video
videos = []
for i, _ in enumerate(files, 1):
video_id = '%04d%d' % (episode, i)
video_url = 'http://dl%d.fernsehkritik.tv/fernsehkritik%d%s.flv' % (server, episode, '' if i == 1 else '-%d' % i)
video_title = 'Fernsehkritik %d.%d' % (episode, i)
videos.append({
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': clean_html(get_element_by_id('eptitle', start_webpage)),
'description': clean_html(get_element_by_id('contentlist', start_webpage)),
'thumbnail': video_thumbnail
})
return videos
class FKTVPosteckeIE(InfoExtractor):
IE_NAME = u'fernsehkritik.tv:postecke'
_VALID_URL = r'(?:http://)?(?:www\.)?fernsehkritik.tv/inline-video/postecke.php\?(.*&)?ep=(?P<ep>[0-9]+)(&|$)'
_TEST = {
u'url': u'http://fernsehkritik.tv/inline-video/postecke.php?iframe=true&width=625&height=440&ep=120',
u'file': u'0120.flv',
u'md5': u'262f0adbac80317412f7e57b4808e5c4',
u'info_dict': {
u"title": u"Postecke 120"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
episode = int(mobj.group('ep'))
server = random.randint(2, 4)
video_id = '%04d' % episode
video_url = 'http://dl%d.fernsehkritik.tv/postecke/postecke%d.flv' % (server, episode)
video_title = 'Postecke %d' % episode
return {
'id': video_id,
'url': video_url,
'ext': determine_ext(video_url),
'title': video_title,
}

View File

@@ -9,7 +9,7 @@ from ..utils import (
class FlickrIE(InfoExtractor):
"""Information Extractor for Flickr videos"""
_VALID_URL = r'(?:https?://)?(?:www\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_VALID_URL = r'(?:https?://)?(?:www\.|secure\.)?flickr\.com/photos/(?P<uploader_id>[\w\-_@]+)/(?P<id>\d+).*'
_TEST = {
u'url': u'http://www.flickr.com/photos/forestwander-nature-pictures/5645318632/in/photostream/',
u'file': u'5645318632.mp4',

View File

@@ -0,0 +1,129 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
)
class FranceTVBaseInfoExtractor(InfoExtractor):
def _extract_video(self, video_id):
xml_desc = self._download_webpage(
'http://www.francetvinfo.fr/appftv/webservices/video/'
'getInfosOeuvre.php?id-diffusion='
+ video_id, video_id, 'Downloading XML config')
info = xml.etree.ElementTree.fromstring(xml_desc.encode('utf-8'))
manifest_url = info.find('videos/video/url').text
video_url = manifest_url.replace('manifest.f4m', 'index_2_av.m3u8')
video_url = video_url.replace('/z/', '/i/')
thumbnail_path = info.find('image').text
return {'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': info.find('titre').text,
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', thumbnail_path),
'description': info.find('synopsis').text,
}
class PluzzIE(FranceTVBaseInfoExtractor):
IE_NAME = u'pluzz.francetv.fr'
_VALID_URL = r'https?://pluzz\.francetv\.fr/videos/(.*?)\.html'
# Can't use tests, videos expire in 7 days
def _real_extract(self, url):
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
video_id = self._search_regex(
r'data-diffusion="(\d+)"', webpage, 'ID')
return self._extract_video(video_id)
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = u'francetvinfo.fr'
_VALID_URL = r'https?://www\.francetvinfo\.fr/replay.*/(?P<title>.+).html'
_TEST = {
u'url': u'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
u'file': u'84981923.mp4',
u'info_dict': {
u'title': u'Soir 3',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
video_id = self._search_regex(r'id-video=(\d+?)"', webpage, u'video id')
return self._extract_video(video_id)
class France2IE(FranceTVBaseInfoExtractor):
IE_NAME = u'france2.fr'
_VALID_URL = r'''(?x)https?://www\.france2\.fr/
(?:
emissions/.*?/videos/(?P<id>\d+)
| emission/(?P<key>[^/?]+)
)'''
_TEST = {
u'url': u'http://www.france2.fr/emissions/13h15-le-samedi-le-dimanche/videos/75540104',
u'file': u'75540104.mp4',
u'info_dict': {
u'title': u'13h15, le samedi...',
u'description': u'md5:2e5b58ba7a2d3692b35c792be081a03d',
},
u'params': {
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj.group('key'):
webpage = self._download_webpage(url, mobj.group('key'))
video_id = self._html_search_regex(
r'''(?x)<div\s+class="video-player">\s*
<a\s+href="http://videos.francetv.fr/video/([0-9]+)"\s+
class="francetv-video-player">''',
webpage, u'video ID')
else:
video_id = mobj.group('id')
return self._extract_video(video_id)
class GenerationQuoiIE(InfoExtractor):
IE_NAME = u'france2.fr:generation-quoi'
_VALID_URL = r'https?://generation-quoi\.france2\.fr/portrait/(?P<name>.*)(\?|$)'
_TEST = {
u'url': u'http://generation-quoi.france2.fr/portrait/garde-a-vous',
u'file': u'k7FJX8VBcvvLmX4wA5Q.mp4',
u'info_dict': {
u'title': u'Génération Quoi - Garde à Vous',
u'uploader': u'Génération Quoi',
},
u'params': {
# It uses Dailymotion
u'skip_download': True,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name = mobj.group('name')
info_url = compat_urlparse.urljoin(url, '/medias/video/%s.json' % name)
info_json = self._download_webpage(info_url, name)
info = json.loads(info_json)
return self.url_result('http://www.dailymotion.com/video/%s' % info['id'],
ie='Dailymotion')

View File

@@ -21,17 +21,15 @@ class FunnyOrDieIE(InfoExtractor):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._html_search_regex(r'<video[^>]*>\s*<source[^>]*>\s*<source src="(?P<url>[^"]+)"',
video_url = self._search_regex(
[r'type="video/mp4" src="(.*?)"', r'src="([^>]*?)" type=\'video/mp4\''],
webpage, u'video URL', flags=re.DOTALL)
title = self._html_search_regex((r"<h1 class='player_page_h1'.*?>(?P<title>.*?)</h1>",
r'<title>(?P<title>[^<]+?)</title>'), webpage, 'title', flags=re.DOTALL)
info = {
'id': video_id,
'url': video_url,
'ext': 'mp4',
'title': title,
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
}
return [info]

View File

@@ -1,55 +1,59 @@
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
unified_strdate,
compat_urllib_parse,
compat_urlparse,
unescapeHTML,
get_meta_content,
)
class GameSpotIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?gamespot\.com/.*-(?P<page_id>\d+)/?'
_TEST = {
u"url": u"http://www.gamespot.com/arma-iii/videos/arma-iii-community-guide-sitrep-i-6410818/",
u"file": u"6410818.mp4",
u"file": u"gs-2300-6410818.mp4",
u"md5": u"b2a30deaa8654fcccd43713a6b6a4825",
u"info_dict": {
u"title": u"Arma III - Community Guide: SITREP I",
u"upload_date": u"20130627",
u"title": u"Arma 3 - Community Guide: SITREP I",
u'description': u'Check out this video where some of the basics of Arma 3 is explained.',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_id = mobj.group('page_id')
page_id = video_id = mobj.group('page_id')
webpage = self._download_webpage(url, page_id)
video_id = self._html_search_regex([r'"og:video" content=".*?\?id=(\d+)"',
r'http://www\.gamespot\.com/videoembed/(\d+)'],
webpage, 'video id')
data = compat_urllib_parse.urlencode({'id': video_id, 'newplayer': '1'})
info_url = 'http://www.gamespot.com/pages/video_player/xml.php?' + data
info_xml = self._download_webpage(info_url, video_id)
doc = xml.etree.ElementTree.fromstring(info_xml)
clip_el = doc.find('./playList/clip')
data_video_json = self._search_regex(r'data-video=\'(.*?)\'', webpage, u'data video')
data_video = json.loads(unescapeHTML(data_video_json))
http_urls = [{'url': node.find('filePath').text,
'rate': int(node.find('rate').text)}
for node in clip_el.find('./httpURI')]
best_quality = sorted(http_urls, key=lambda f: f['rate'])[-1]
video_url = best_quality['url']
title = clip_el.find('./title').text
ext = video_url.rpartition('.')[2]
thumbnail_url = clip_el.find('./screenGrabURI').text
view_count = int(clip_el.find('./views').text)
upload_date = unified_strdate(clip_el.find('./postDate').text)
# Transform the manifest url to a link to the mp4 files
# they are used in mobile devices.
f4m_url = data_video['videoStreams']['f4m_stream']
f4m_path = compat_urlparse.urlparse(f4m_url).path
QUALITIES_RE = r'((,\d+)+,?)'
qualities = self._search_regex(QUALITIES_RE, f4m_path, u'qualities').strip(',').split(',')
http_path = f4m_path[1:].split('/', 1)[1]
http_template = re.sub(QUALITIES_RE, r'%s', http_path)
http_template = http_template.replace('.csmil/manifest.f4m', '')
http_template = compat_urlparse.urljoin('http://video.gamespotcdn.com/', http_template)
formats = []
for q in qualities:
formats.append({
'url': http_template % q,
'ext': 'mp4',
'format_id': q,
})
return [{
'id' : video_id,
'url' : video_url,
'ext' : ext,
'title' : title,
'thumbnail' : thumbnail_url,
'upload_date' : upload_date,
'view_count' : view_count,
}]
info = {
'id': data_video['guid'],
'title': compat_urllib_parse.unquote(data_video['title']),
'formats': formats,
'description': get_meta_content('description', webpage),
'thumbnail': self._og_search_thumbnail(webpage),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -8,11 +8,15 @@ from ..utils import (
compat_urllib_error,
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
ExtractorError,
smuggle_url,
unescapeHTML,
)
from .brightcove import BrightcoveIE
class GenericIE(InfoExtractor):
IE_DESC = u'Generic downloader that works on some sites'
_VALID_URL = r'.*'
@@ -23,21 +27,21 @@ class GenericIE(InfoExtractor):
u'file': u'13601338388002.mp4',
u'md5': u'85b90ccc9d73b4acd9138d3af4c27f89',
u'info_dict': {
u"uploader": u"www.hodiho.fr",
u"uploader": u"www.hodiho.fr",
u"title": u"R\u00e9gis plante sa Jeep"
}
},
# embedded vimeo video
{
u'url': u'http://www.8tv.cat/8aldia/videos/xavier-sala-i-martin-aquesta-tarda-a-8-al-dia/',
u'file': u'2371591881001.mp4',
u'md5': u'9e80619e0a94663f0bdc849b4566af19',
u'note': u'Test Brightcove downloads and detection in GenericIE',
u'url': u'http://skillsmatter.com/podcast/home/move-semanticsperfect-forwarding-and-rvalue-references',
u'file': u'22444065.mp4',
u'md5': u'2903896e23df39722c33f015af0666e2',
u'info_dict': {
u'title': u'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
u'uploader': u'8TV',
u'description': u'md5:a950cc4285c43e44d763d036710cd9cd',
u'title': u'ACCU 2011: Move Semantics,Perfect Forwarding, and Rvalue references- Scott Meyers- 13/04/2011',
u"uploader_id": u"skillsmatter",
u"uploader": u"Skills Matter",
}
},
}
]
def report_download_webpage(self, video_id):
@@ -107,8 +111,18 @@ class GenericIE(InfoExtractor):
return new_url
def _real_extract(self, url):
new_url = self._test_redirect(url)
if new_url: return [self.url_result(new_url)]
parsed_url = compat_urlparse.urlparse(url)
if not parsed_url.scheme:
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
try:
new_url = self._test_redirect(url)
if new_url:
return [self.url_result(new_url)]
except compat_urllib_error.HTTPError:
# This may be a stupid server that doesn't like HEAD, our UA, or so
pass
video_id = url.split('/')[-1]
try:
@@ -116,16 +130,24 @@ class GenericIE(InfoExtractor):
except ValueError:
# since this is the last-resort InfoExtractor, if
# this error is thrown, it'll be thrown here
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Failed to download URL: %s' % url)
self.report_extraction(video_id)
# Look for BrigthCove:
m_brightcove = re.search(r'<object.+?class=([\'"]).*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
# Look for BrightCove:
m_brightcove = re.search(r'<object[^>]+?class=([\'"])[^>]*?BrightcoveExperience.*?\1.+?</object>', webpage, re.DOTALL)
if m_brightcove is not None:
self.to_screen(u'Brightcove video detected.')
bc_url = BrightcoveIE._build_brighcove_url(m_brightcove.group())
return self.url_result(bc_url, 'Brightcove')
# Look for embedded Vimeo player
mobj = re.search(
r'<iframe\s+src="(https?://player.vimeo.com/video/.*?)"', webpage)
if mobj:
player_url = unescapeHTML(mobj.group(1))
surl = smuggle_url(player_url, {'Referer': url})
return self.url_result(surl, 'Vimeo')
# Start with something easy: JW Player in SWFObject
mobj = re.search(r'flashvars: [\'"](?:.*&)?file=(http[^\'"&]*)', webpage)
if mobj is None:
@@ -145,15 +167,19 @@ class GenericIE(InfoExtractor):
if m_video_type is not None:
mobj = re.search(r'<meta.*?property="og:video".*?content="(.*?)"', webpage)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
# HTML5 video
mobj = re.search(r'<video[^<]*(?:>.*?<source.*?)? src="([^"]+)"', webpage, flags=re.DOTALL)
if mobj is None:
raise ExtractorError(u'Unsupported URL: %s' % url)
# It's possible that one of the regexes
# matched, but returned an empty group:
if mobj.group(1) is None:
raise ExtractorError(u'Invalid URL: %s' % url)
raise ExtractorError(u'Did not find a valid video URL at %s' % url)
video_url = compat_urllib_parse.unquote(mobj.group(1))
video_id = os.path.basename(video_url)
video_url = mobj.group(1)
video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse.unquote(os.path.basename(video_url))
# here's a fun little line of code for you:
video_extension = os.path.splitext(video_id)[1][1:]

View File

@@ -40,7 +40,9 @@ class GooglePlusIE(InfoExtractor):
self.report_extraction(video_id)
# Extract update date
upload_date = self._html_search_regex('title="Timestamp">(.*?)</a>',
upload_date = self._html_search_regex(
r'''(?x)<a.+?class="o-T-s\s[^"]+"\s+style="display:\s*none"\s*>
([0-9]{4}-[0-9]{2}-[0-9]{2})</a>''',
webpage, u'upload date', fatal=False)
if upload_date:
# Convert timestring to a format suitable for filename
@@ -57,8 +59,8 @@ class GooglePlusIE(InfoExtractor):
webpage, 'title', default=u'NA')
# Step 2, Simulate clicking the image box to launch video
DOMAIN = 'https://plus.google.com'
video_page = self._search_regex(r'<a href="((?:%s)?/photos/.*?)"' % re.escape(DOMAIN),
DOMAIN = 'https://plus.google.com/'
video_page = self._search_regex(r'<a href="((?:%s)?photos/.*?)"' % re.escape(DOMAIN),
webpage, u'video page URL')
if not video_page.startswith(DOMAIN):
video_page = DOMAIN + video_page

View File

@@ -0,0 +1,37 @@
# -*- coding: utf-8 -*-
import re
import json
from .common import InfoExtractor
from ..utils import determine_ext
class HarkIE(InfoExtractor):
_VALID_URL = r'https?://www\.hark\.com/clips/(.+?)-.+'
_TEST = {
u'url': u'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
u'file': u'mmbzyhkgny.mp3',
u'md5': u'6783a58491b47b92c7c1af5a77d4cbee',
u'info_dict': {
u'title': u"Obama: 'Beyond The Afghan Theater, We Only Target Al Qaeda' on May 23, 2013",
u'description': u'President Barack Obama addressed the nation live on May 23, 2013 in a speech aimed at addressing counter-terrorism policies including the use of drone strikes, detainees at Guantanamo Bay prison facility, and American citizens who are terrorists.',
u'duration': 11,
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
json_url = "http://www.hark.com/clips/%s.json" %(video_id)
info_json = self._download_webpage(json_url, video_id)
info = json.loads(info_json)
final_url = info['url']
return {'id': video_id,
'url' : final_url,
'title': info['name'],
'ext': determine_ext(final_url),
'description': info['description'],
'thumbnail': info['image_original'],
'duration': info['duration'],
}

View File

@@ -7,11 +7,11 @@ from .common import InfoExtractor
class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'http://www\.hotnewhiphop.com/.*\.(?P<id>.*)\.html'
_TEST = {
u'url': u"http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html'",
u'url': u"http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html",
u'file': u'1435540.mp3',
u'md5': u'2c2cd2f76ef11a9b3b581e8b232f3d96',
u'info_dict': {
u"title": u"Freddie Gibbs Songs - Lay It Down"
u"title": u"Freddie Gibbs - Lay It Down"
}
}

View File

@@ -19,8 +19,7 @@ class HowcastIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://www.howcast.com/videos/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)

View File

@@ -13,7 +13,7 @@ class IGNIE(InfoExtractor):
Some videos of it.ign.com are also supported
"""
_VALID_URL = r'https?://.+?\.ign\.com/(?:videos|show_videos)(/.+)?/(?P<name_or_id>.+)'
_VALID_URL = r'https?://.+?\.ign\.com/(?P<type>videos|show_videos|articles|(?:[^/]*/feature))(/.+)?/(?P<name_or_id>.+)'
IE_NAME = u'ign.com'
_CONFIG_URL_TEMPLATE = 'http://www.ign.com/videos/configs/id/%s.config'
@@ -21,15 +21,39 @@ class IGNIE(InfoExtractor):
r'id="my_show_video">.*?<p>(.*?)</p>',
]
_TEST = {
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'info_dict': {
u'title': u'The Last of Us Review',
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
}
}
_TESTS = [
{
u'url': u'http://www.ign.com/videos/2013/06/05/the-last-of-us-review',
u'file': u'8f862beef863986b2785559b9e1aa599.mp4',
u'md5': u'eac8bdc1890980122c3b66f14bdd02e9',
u'info_dict': {
u'title': u'The Last of Us Review',
u'description': u'md5:c8946d4260a4d43a00d5ae8ed998870c',
}
},
{
u'url': u'http://me.ign.com/en/feature/15775/100-little-things-in-gta-5-that-will-blow-your-mind',
u'playlist': [
{
u'file': u'5ebbd138523268b93c9141af17bec937.mp4',
u'info_dict': {
u'title': u'GTA 5 Video Review',
u'description': u'Rockstar drops the mic on this generation of games. Watch our review of the masterly Grand Theft Auto V.',
},
},
{
u'file': u'638672ee848ae4ff108df2a296418ee2.mp4',
u'info_dict': {
u'title': u'GTA 5\'s Twisted Beauty in Super Slow Motion',
u'description': u'The twisted beauty of GTA 5 in stunning slow motion.',
},
},
],
u'params': {
u'skip_download': True,
},
},
]
def _find_video_id(self, webpage):
res_id = [r'data-video-id="(.+?)"',
@@ -41,7 +65,18 @@ class IGNIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
name_or_id = mobj.group('name_or_id')
page_type = mobj.group('type')
webpage = self._download_webpage(url, name_or_id)
if page_type == 'articles':
video_url = self._search_regex(r'var videoUrl = "(.+?)"', webpage, u'video url')
return self.url_result(video_url, ie='IGN')
elif page_type != 'video':
multiple_urls = re.findall(
'<param name="flashvars" value="[^"]*?url=(https?://www\.ign\.com/videos/.*?)["&]',
webpage)
if multiple_urls:
return [self.url_result(u, ie='IGN') for u in multiple_urls]
video_id = self._find_video_id(webpage)
result = self._get_video_info(video_id)
description = self._html_search_regex(self._DESCRIPTION_RE,
@@ -68,7 +103,7 @@ class IGNIE(InfoExtractor):
class OneUPIE(IGNIE):
"""Extractor for 1up.com, it uses the ign videos system."""
_VALID_URL = r'https?://gamevideos.1up.com/video/id/(?P<name_or_id>.+)'
_VALID_URL = r'https?://gamevideos.1up.com/(?P<type>video)/id/(?P<name_or_id>.+)'
IE_NAME = '1up.com'
_DESCRIPTION_RE = r'<div id="vid_summary">(.+?)</div>'
@@ -83,6 +118,9 @@ class OneUPIE(IGNIE):
}
}
# Override IGN tests
_TESTS = []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
id = mobj.group('name_or_id')

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class InaIE(InfoExtractor):
"""Information Extractor for Ina.fr"""
_VALID_URL = r'(?:http://)?(?:www\.)?ina\.fr/video/(?P<id>I[0-9]+)/.*'
_VALID_URL = r'(?:http://)?(?:www\.)?ina\.fr/video/(?P<id>I?[A-F0-9]+)/.*'
_TEST = {
u'url': u'www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
u'file': u'I12055569.mp4',

View File

@@ -0,0 +1,87 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_urllib_parse,
xpath_with_ns,
determine_ext,
)
class InternetVideoArchiveIE(InfoExtractor):
_VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
_TEST = {
u'url': u'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
u'file': u'452693.mp4',
u'info_dict': {
u'title': u'SKYFALL',
u'description': u'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
u'duration': 156,
},
}
@staticmethod
def _build_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
@staticmethod
def _clean_query(query):
NEEDED_ARGS = ['publishedid', 'customerid']
query_dic = compat_urlparse.parse_qs(query)
cleaned_dic = dict((k,v[0]) for (k,v) in query_dic.items() if k in NEEDED_ARGS)
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse.urlencode(cleaned_dic)
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_urlparse.parse_qs(query)
video_id = query_dic['publishedid'][0]
url = self._build_url(query)
flashconfiguration_xml = self._download_webpage(url, video_id,
u'Downloading flash configuration')
flashconfiguration = xml.etree.ElementTree.fromstring(flashconfiguration_xml.encode('utf-8'))
file_url = flashconfiguration.find('file').text
file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
# Replace some of the parameters in the query to get the best quality
# and http links (no m3u8 manifests)
file_url = re.sub(r'(?<=\?)(.+)$',
lambda m: self._clean_query(m.group()),
file_url)
info_xml = self._download_webpage(file_url, video_id,
u'Downloading video info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
item = info.find('channel/item')
def _bp(p):
return xpath_with_ns(p,
{'media': 'http://search.yahoo.com/mrss/',
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'})
formats = []
for content in item.findall(_bp('media:group/media:content')):
attr = content.attrib
f_url = attr['url']
formats.append({
'url': f_url,
'ext': determine_ext(f_url),
'width': int(attr['width']),
'bitrate': int(attr['bitrate']),
})
formats = sorted(formats, key=lambda f: f['bitrate'])
info = {
'id': video_id,
'title': item.find('title').text,
'formats': formats,
'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
'description': item.find('description').text,
'duration': int(attr['duration']),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,52 @@
# coding: utf-8
import json
import re
import xml.etree.ElementTree
from .common import InfoExtractor
class JeuxVideoIE(InfoExtractor):
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)-\d+\.htm'
_TEST = {
u'url': u'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
u'file': u'5182.mp4',
u'md5': u'046e491afb32a8aaac1f44dd4ddd54ee',
u'info_dict': {
u'title': u'GC 2013 : Tearaway nous présente ses papiers d\'identité',
u'description': u'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.\n',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = re.match(self._VALID_URL, url).group(1)
webpage = self._download_webpage(url, title)
xml_link = self._html_search_regex(
r'<param name="flashvars" value="config=(.*?)" />',
webpage, u'config URL')
video_id = self._search_regex(
r'http://www\.jeuxvideo\.com/config/\w+/\d+/(.*?)/\d+_player\.xml',
xml_link, u'video ID')
xml_config = self._download_webpage(
xml_link, title, u'Downloading XML config')
config = xml.etree.ElementTree.fromstring(xml_config.encode('utf-8'))
info_json = self._search_regex(
r'(?sm)<format\.json>(.*?)</format\.json>',
xml_config, u'JSON information')
info = json.loads(info_json)['versions'][0]
video_url = 'http://video720.jeuxvideo.com/' + info['file']
return {
'id': video_id,
'title': config.find('titre_video').text,
'ext': 'mp4',
'url': video_url,
'description': self._og_search_description(webpage),
'thumbnail': config.find('image').text,
}

View File

@@ -0,0 +1,39 @@
import re
from .common import InfoExtractor
from ..utils import determine_ext
class KankanIE(InfoExtractor):
_VALID_URL = r'https?://(?:.*?\.)?kankan\.com/.+?/(?P<id>\d+)\.shtml'
_TEST = {
u'url': u'http://yinyue.kankan.com/vod/48/48863.shtml',
u'file': u'48863.flv',
u'md5': u'29aca1e47ae68fc28804aca89f29507e',
u'info_dict': {
u'title': u'Ready To Go',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._search_regex(r'(?:G_TITLE=|G_MOVIE_TITLE = )[\'"](.+?)[\'"]', webpage, u'video title')
surls = re.search(r'surls:\[\'.+?\'\]|lurl:\'.+?\.flv\'', webpage).group(0)
gcids = re.findall(r"http://.+?/.+?/(.+?)/", surls)
gcid = gcids[-1]
video_info_page = self._download_webpage('http://p2s.cl.kankan.com/getCdnresource_flv?gcid=%s' % gcid,
video_id, u'Downloading video url info')
ip = self._search_regex(r'ip:"(.+?)"', video_info_page, u'video url ip')
path = self._search_regex(r'path:"(.+?)"', video_info_page, u'video url path')
video_url = 'http://%s%s' % (ip, path)
return {'id': video_id,
'title': title,
'url': video_url,
'ext': determine_ext(video_url),
}

View File

@@ -4,10 +4,10 @@ from .common import InfoExtractor
class KeekIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?keek\.com/(?:!|\w+/keeks/)(?P<videoID>\w+)'
_VALID_URL = r'https?://(?:www\.)?keek\.com/(?:!|\w+/keeks/)(?P<videoID>\w+)'
IE_NAME = u'keek'
_TEST = {
u'url': u'http://www.keek.com/ytdl/keeks/NODfbab',
u'url': u'https://www.keek.com/ytdl/keeks/NODfbab',
u'file': u'NODfbab.mp4',
u'md5': u'9b0636f8c0f7614afa4ea5e4c6e57e83',
u'info_dict': {

View File

@@ -0,0 +1,37 @@
import re
from .common import InfoExtractor
class KickStarterIE(InfoExtractor):
_VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>\d*)/.*'
_TEST = {
u"url": u"https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant?ref=home_location",
u"file": u"1404461844.mp4",
u"md5": u"c81addca81327ffa66c642b5d8b08cab",
u"info_dict": {
u"title": u"Intersection: The Story of Josh Grant by Kyle Cowling",
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage_src = self._download_webpage(url, video_id)
video_url = self._search_regex(r'data-video="(.*?)">',
webpage_src, u'video URL')
if 'mp4' in video_url:
ext = 'mp4'
else:
ext = 'flv'
video_title = self._html_search_regex(r"<title>(.*?)</title>",
webpage_src, u'title').rpartition(u'\u2014 Kickstarter')[0].strip()
results = [{
'id': video_id,
'url': video_url,
'title': video_title,
'ext': ext,
}]
return results

View File

@@ -2,7 +2,12 @@ import re
import json
from .common import InfoExtractor
from ..utils import compat_urllib_parse_urlparse, compat_urlparse
from ..utils import (
compat_urllib_parse_urlparse,
compat_urlparse,
get_meta_content,
ExtractorError,
)
class LivestreamIE(InfoExtractor):
@@ -35,8 +40,11 @@ class LivestreamIE(InfoExtractor):
if video_id is None:
# This is an event page:
api_url = self._search_regex(r'event_design_eventId: \'(.+?)\'',
webpage, 'api url')
player = get_meta_content('twitter:player', webpage)
if player is None:
raise ExtractorError('Couldn\'t extract event api url')
api_url = player.replace('/player', '')
api_url = re.sub(r'^(https?://)(new\.)', r'\1api.\2', api_url)
info = json.loads(self._download_webpage(api_url, event_name,
u'Downloading event info'))
videos = [self._extract_video_info(video_data['data'])

View File

@@ -122,7 +122,7 @@ class MetacafeIE(InfoExtractor):
video_title = self._html_search_regex(r'(?im)<title>(.*) - Video</title>', webpage, u'title')
description = self._og_search_description(webpage)
video_uploader = self._html_search_regex(
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("channel","([^"]+)"\);',
r'submitter=(.*?);|googletag\.pubads\(\)\.setTargeting\("(?:channel|submiter)","([^"]+)"\);',
webpage, u'uploader nickname', fatal=False)
return {

View File

@@ -0,0 +1,55 @@
import re
import xml.etree.ElementTree
import operator
from .common import InfoExtractor
class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
u'file': u'3698222.mp4',
u'info_dict': {
u'title': u'inFamous: Second Son - inSide Sucker Punch: Smoke & Mirrors',
u'description': u'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
u'duration': 221,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
# The xml is not well formatted, there are raw '&'
info_xml = self._download_webpage('http://www.metacritic.com/video_data?video=' + video_id,
video_id, u'Downloading info xml').replace('&', '&amp;')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
clip = next(c for c in info.findall('playList/clip') if c.find('id').text == video_id)
formats = []
for videoFile in clip.findall('httpURI/videoFile'):
rate_str = videoFile.find('rate').text
video_url = videoFile.find('filePath').text
formats.append({
'url': video_url,
'ext': 'mp4',
'format_id': rate_str,
'rate': int(rate_str),
})
formats.sort(key=operator.itemgetter('rate'))
description = self._html_search_regex(r'<b>Description:</b>(.*?)</p>',
webpage, u'description', flags=re.DOTALL)
info = {
'id': video_id,
'title': clip.find('title').text,
'formats': formats,
'description': description,
'duration': int(clip.find('duration').text),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,74 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
clean_html,
get_element_by_id,
)
class TechTVMITIE(InfoExtractor):
IE_NAME = u'techtv.mit.edu'
_VALID_URL = r'https?://techtv\.mit\.edu/(videos|embeds)/(?P<id>\d+)'
_TEST = {
u'url': u'http://techtv.mit.edu/videos/25418-mit-dna-learning-center-set',
u'file': u'25418.mp4',
u'md5': u'1f8cb3e170d41fd74add04d3c9330e5f',
u'info_dict': {
u'title': u'MIT DNA Learning Center Set',
u'description': u'md5:82313335e8a8a3f243351ba55bc1b474',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
raw_page = self._download_webpage(
'http://techtv.mit.edu/videos/%s' % video_id, video_id)
clean_page = re.compile(u'<!--.*?-->', re.S).sub(u'', raw_page)
base_url = self._search_regex(r'ipadUrl: \'(.+?cloudfront.net/)',
raw_page, u'base url')
formats_json = self._search_regex(r'bitrates: (\[.+?\])', raw_page,
u'video formats')
formats = json.loads(formats_json)
formats = sorted(formats, key=lambda f: f['bitrate'])
title = get_element_by_id('edit-title', clean_page)
description = clean_html(get_element_by_id('edit-description', clean_page))
thumbnail = self._search_regex(r'playlist:.*?url: \'(.+?)\'',
raw_page, u'thumbnail', flags=re.DOTALL)
return {'id': video_id,
'title': title,
'url': base_url + formats[-1]['url'].replace('mp4:', ''),
'ext': 'mp4',
'description': description,
'thumbnail': thumbnail,
}
class MITIE(TechTVMITIE):
IE_NAME = u'video.mit.edu'
_VALID_URL = r'https?://video\.mit\.edu/watch/(?P<title>[^/]+)'
_TEST = {
u'url': u'http://video.mit.edu/watch/the-government-is-profiling-you-13222/',
u'file': u'21783.mp4',
u'md5': u'7db01d5ccc1895fc5010e9c9e13648da',
u'info_dict': {
u'title': u'The Government is Profiling You',
u'description': u'md5:ad5795fe1e1623b73620dbfd47df9afd',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
self.to_screen('%s: Extracting %s url' % (page_title, TechTVMITIE.IE_NAME))
embed_url = self._search_regex(r'<iframe .*?src="(.+?)"', webpage,
u'embed url')
return self.url_result(embed_url, ie='TechTVMIT')

View File

@@ -5,34 +5,27 @@ import socket
from .common import InfoExtractor
from ..utils import (
compat_http_client,
compat_str,
compat_urllib_error,
compat_urllib_request,
ExtractorError,
unified_strdate,
)
class MixcloudIE(InfoExtractor):
_WORKING = False # New API, but it seems good http://www.mixcloud.com/developers/documentation/
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([\w\d-]+)/([\w\d-]+)'
IE_NAME = u'mixcloud'
def report_download_json(self, file_id):
"""Report JSON download."""
self.to_screen(u'Downloading json')
def get_urls(self, jsonData, fmt, bitrate='best'):
"""Get urls from 'audio_formats' section in json"""
try:
bitrate_list = jsonData[fmt]
if bitrate is None or bitrate == 'best' or bitrate not in bitrate_list:
bitrate = max(bitrate_list) # select highest
url_list = jsonData[fmt][bitrate]
except TypeError: # we have no bitrate info.
url_list = jsonData[fmt]
return url_list
_TEST = {
u'url': u'http://www.mixcloud.com/dholbach/cryptkeeper/',
u'file': u'dholbach-cryptkeeper.mp3',
u'info_dict': {
u'title': u'Cryptkeeper',
u'description': u'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
u'uploader': u'Daniel Holbach',
u'uploader_id': u'dholbach',
u'upload_date': u'20111115',
},
}
def check_urls(self, url_list):
"""Returns 1st active url from list"""
@@ -45,71 +38,32 @@ class MixcloudIE(InfoExtractor):
return None
def _print_formats(self, formats):
print('Available formats:')
for fmt in formats.keys():
for b in formats[fmt]:
try:
ext = formats[fmt][b][0]
print('%s\t%s\t[%s]' % (fmt, b, ext.split('.')[-1]))
except TypeError: # we have no bitrate info
ext = formats[fmt][0]
print('%s\t%s\t[%s]' % (fmt, '??', ext.split('.')[-1]))
break
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
# extract uploader & filename from url
uploader = mobj.group(1).decode('utf-8')
file_id = uploader + "-" + mobj.group(2).decode('utf-8')
# construct API request
file_url = 'http://www.mixcloud.com/api/1/cloudcast/' + '/'.join(url.split('/')[-3:-1]) + '.json'
# retrieve .json file with links to files
request = compat_urllib_request.Request(file_url)
try:
self.report_download_json(file_url)
jsonData = compat_urllib_request.urlopen(request).read()
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
raise ExtractorError(u'Unable to retrieve file: %s' % compat_str(err))
uploader = mobj.group(1)
cloudcast_name = mobj.group(2)
track_id = '-'.join((uploader, cloudcast_name))
api_url = 'http://api.mixcloud.com/%s/%s/' % (uploader, cloudcast_name)
webpage = self._download_webpage(url, track_id)
json_data = self._download_webpage(api_url, track_id,
u'Downloading cloudcast info')
info = json.loads(json_data)
# parse JSON
json_data = json.loads(jsonData)
player_url = json_data['player_swf_url']
formats = dict(json_data['audio_formats'])
preview_url = self._search_regex(r'data-preview-url="(.+?)"', webpage, u'preview url')
song_url = preview_url.replace('/previews/', '/cloudcasts/originals/')
template_url = re.sub(r'(stream\d*)', 'stream%d', song_url)
final_song_url = self.check_urls(template_url % i for i in range(30))
req_format = self._downloader.params.get('format', None)
if self._downloader.params.get('listformats', None):
self._print_formats(formats)
return
if req_format is None or req_format == 'best':
for format_param in formats.keys():
url_list = self.get_urls(formats, format_param)
# check urls
file_url = self.check_urls(url_list)
if file_url is not None:
break # got it!
else:
if req_format not in formats:
raise ExtractorError(u'Format is not available')
url_list = self.get_urls(formats, req_format)
file_url = self.check_urls(url_list)
format_param = req_format
return [{
'id': file_id.decode('utf-8'),
'url': file_url.decode('utf-8'),
'uploader': uploader.decode('utf-8'),
'upload_date': None,
'title': json_data['name'],
'ext': file_url.split('.')[-1].decode('utf-8'),
'format': (format_param is None and u'NA' or format_param.decode('utf-8')),
'thumbnail': json_data['thumbnail_url'],
'description': json_data['description'],
'player_url': player_url.decode('utf-8'),
}]
return {
'id': track_id,
'title': info['name'],
'url': final_song_url,
'ext': 'mp3',
'description': info['description'],
'thumbnail': info['pictures'].get('extra_large'),
'uploader': info['user']['name'],
'uploader_id': info['user']['username'],
'upload_date': unified_strdate(info['created_time']),
'view_count': info['play_count'],
}

View File

@@ -54,23 +54,26 @@ class MTVIE(InfoExtractor):
def _get_thumbnail_url(self, uri, itemdoc):
return 'http://mtv.mtvnimages.com/uri/' + uri
def _extract_video_url(self, metadataXml):
def _extract_video_formats(self, metadataXml):
if '/error_country_block.swf' in metadataXml:
raise ExtractorError(u'This video is not available from your country.', expected=True)
mdoc = xml.etree.ElementTree.fromstring(metadataXml.encode('utf-8'))
renditions = mdoc.findall('.//rendition')
# For now, always pick the highest quality.
rendition = renditions[-1]
try:
_,_,ext = rendition.attrib['type'].partition('/')
format = ext + '-' + rendition.attrib['width'] + 'x' + rendition.attrib['height'] + '_' + rendition.attrib['bitrate']
rtmp_video_url = rendition.find('./src').text
except KeyError:
raise ExtractorError('Invalid rendition field.')
video_url = self._transform_rtmp_url(rtmp_video_url)
return {'ext': ext, 'url': video_url, 'format': format}
formats = []
for rendition in mdoc.findall('.//rendition'):
try:
_, _, ext = rendition.attrib['type'].partition('/')
rtmp_video_url = rendition.find('./src').text
formats.append({'ext': ext,
'url': self._transform_rtmp_url(rtmp_video_url),
'format_id': rendition.get('bitrate'),
'width': int(rendition.get('width')),
'height': int(rendition.get('height')),
})
except (KeyError, TypeError):
raise ExtractorError('Invalid rendition field.')
return formats
def _get_video_info(self, itemdoc):
uri = itemdoc.find('guid').text
@@ -81,19 +84,25 @@ class MTVIE(InfoExtractor):
mediagen_url += '&acceptMethods=fms'
mediagen_page = self._download_webpage(mediagen_url, video_id,
u'Downloading video urls')
video_info = self._extract_video_url(mediagen_page)
description_node = itemdoc.find('description')
if description_node is not None:
description = description_node.text
description = description_node.text.strip()
else:
description = None
video_info.update({'title': itemdoc.find('title').text,
'id': video_id,
'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description,
})
return video_info
info = {
'title': itemdoc.find('title').text,
'formats': self._extract_video_formats(mediagen_page),
'id': video_id,
'thumbnail': self._get_thumbnail_url(uri, itemdoc),
'description': description,
}
# TODO: Remove when #980 has been merged
info.update(info['formats'][-1])
return info
def _get_videos_info(self, uri):
video_id = self._id_from_uri(uri)

View File

@@ -0,0 +1,64 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
determine_ext,
)
class MuzuTVIE(InfoExtractor):
_VALID_URL = r'https?://www.muzu.tv/(.+?)/(.+?)/(?P<id>\d+)'
IE_NAME = u'muzu.tv'
_TEST = {
u'url': u'http://www.muzu.tv/defected/marcashken-featuring-sos-cat-walk-original-mix-music-video/1981454/',
u'file': u'1981454.mp4',
u'md5': u'98f8b2c7bc50578d6a0364fff2bfb000',
u'info_dict': {
u'title': u'Cat Walk (Original Mix)',
u'description': u'md5:90e868994de201b2570e4e5854e19420',
u'uploader': u'MarcAshken featuring SOS',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info_data = compat_urllib_parse.urlencode({'format': 'json',
'url': url,
})
video_info_page = self._download_webpage('http://www.muzu.tv/api/oembed/?%s' % info_data,
video_id, u'Downloading video info')
info = json.loads(video_info_page)
player_info_page = self._download_webpage('http://player.muzu.tv/player/playerInit?ai=%s' % video_id,
video_id, u'Downloading player info')
video_info = json.loads(player_info_page)['videos'][0]
for quality in ['1080' , '720', '480', '360']:
if video_info.get('v%s' % quality):
break
data = compat_urllib_parse.urlencode({'ai': video_id,
# Even if each time you watch a video the hash changes,
# it seems to work for different videos, and it will work
# even if you use any non empty string as a hash
'viewhash': 'VBNff6djeV4HV5TRPW5kOHub2k',
'device': 'web',
'qv': quality,
})
video_url_page = self._download_webpage('http://player.muzu.tv/player/requestVideo?%s' % data,
video_id, u'Downloading video url')
video_url_info = json.loads(video_url_page)
video_url = video_url_info['url']
return {'id': video_id,
'title': info['title'],
'url': video_url,
'ext': determine_ext(video_url),
'thumbnail': info['thumbnail_url'],
'description': info['description'],
'uploader': info['author_name'],
}

View File

@@ -2,11 +2,13 @@ import binascii
import base64
import hashlib
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_ord,
compat_urllib_parse,
compat_urllib_request,
ExtractorError,
)
@@ -16,7 +18,7 @@ from ..utils import (
class MyVideoIE(InfoExtractor):
"""Information Extractor for myvideo.de."""
_VALID_URL = r'(?:http://)?(?:www\.)?myvideo\.de/watch/([0-9]+)/([^?/]+).*'
_VALID_URL = r'(?:http://)?(?:www\.)?myvideo\.de/(?:[^/]+/)?watch/([0-9]+)/([^?/]+).*'
IE_NAME = u'myvideo'
_TEST = {
u'url': u'http://www.myvideo.de/watch/8229274/bowling_fail_or_win',
@@ -85,6 +87,20 @@ class MyVideoIE(InfoExtractor):
'ext': video_ext,
}]
mobj = re.search(r'data-video-service="/service/data/video/%s/config' % video_id, webpage)
if mobj is not None:
request = compat_urllib_request.Request('http://www.myvideo.de/service/data/video/%s/config' % video_id, '')
response = self._download_webpage(request, video_id,
u'Downloading video info')
info = json.loads(base64.b64decode(response).decode('utf-8'))
return {'id': video_id,
'title': info['title'],
'url': info['streaming_url'].replace('rtmpe', 'rtmpt'),
'play_path': info['filename'],
'ext': 'flv',
'thumbnail': info['thumbnail'][0]['url'],
}
# try encxml
mobj = re.search('var flashvars={(.+?)}', webpage)
if mobj is None:

View File

@@ -0,0 +1,73 @@
# encoding: utf-8
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urllib_parse,
ExtractorError,
)
class NaverIE(InfoExtractor):
_VALID_URL = r'https?://tvcast\.naver\.com/v/(?P<id>\d+)'
_TEST = {
u'url': u'http://tvcast.naver.com/v/81652',
u'file': u'81652.mp4',
u'info_dict': {
u'title': u'[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
u'description': u'합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
u'upload_date': u'20130903',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group(1)
webpage = self._download_webpage(url, video_id)
m_id = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',
webpage)
if m_id is None:
raise ExtractorError(u'couldn\'t extract vid and key')
vid = m_id.group(1)
key = m_id.group(2)
query = compat_urllib_parse.urlencode({'vid': vid, 'inKey': key,})
query_urls = compat_urllib_parse.urlencode({
'masterVid': vid,
'protocol': 'p2p',
'inKey': key,
})
info_xml = self._download_webpage(
'http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?' + query,
video_id, u'Downloading video info')
urls_xml = self._download_webpage(
'http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?' + query_urls,
video_id, u'Downloading video formats info')
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8'))
urls = xml.etree.ElementTree.fromstring(urls_xml.encode('utf-8'))
formats = []
for format_el in urls.findall('EncodingOptions/EncodingOption'):
domain = format_el.find('Domain').text
if domain.startswith('rtmp'):
continue
formats.append({
'url': domain + format_el.find('uri').text,
'ext': 'mp4',
'width': int(format_el.find('width').text),
'height': int(format_el.find('height').text),
})
info = {
'id': video_id,
'title': info.find('Subject').text,
'formats': formats,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'upload_date': info.find('WriteDate').text.replace('.', ''),
'view_count': int(info.find('PlayCount').text),
}
# TODO: Remove when #980 has been merged
info.update(formats[-1])
return info

View File

@@ -0,0 +1,33 @@
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import find_xpath_attr, compat_str
class NBCNewsIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbcnews\.com/video/.+?/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.nbcnews.com/video/nbc-news/52753292',
u'file': u'52753292.flv',
u'md5': u'47abaac93c6eaf9ad37ee6c4463a5179',
u'info_dict': {
u'title': u'Crew emerges after four-month Mars food study',
u'description': u'md5:24e632ffac72b35f8b67a12d1b6ddfc1',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info_xml = self._download_webpage('http://www.nbcnews.com/id/%s/displaymode/1219' % video_id, video_id)
info = xml.etree.ElementTree.fromstring(info_xml.encode('utf-8')).find('video')
return {'id': video_id,
'title': info.find('headline').text,
'ext': 'flv',
'url': find_xpath_attr(info, 'media', 'type', 'flashVideo').text,
'description': compat_str(info.find('caption').text),
'thumbnail': find_xpath_attr(info, 'media', 'type', 'thumbnail').text,
}

View File

@@ -0,0 +1,38 @@
import json
import re
from .common import InfoExtractor
from ..utils import determine_ext
class NewgroundsIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?newgrounds\.com/audio/listen/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.newgrounds.com/audio/listen/549479',
u'file': u'549479.mp3',
u'md5': u'fe6033d297591288fa1c1f780386f07a',
u'info_dict': {
u"title": u"B7 - BusMode",
u"uploader": u"Burn7",
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
music_id = mobj.group('id')
webpage = self._download_webpage(url, music_id)
title = self._html_search_regex(r',"name":"([^"]+)",', webpage, u'music title')
uploader = self._html_search_regex(r',"artist":"([^"]+)",', webpage, u'music uploader')
music_url_json_string = self._html_search_regex(r'({"url":"[^"]+"),', webpage, u'music url') + '}'
music_url_json = json.loads(music_url_json_string)
music_url = music_url_json['url']
return {
'id': music_id,
'title': title,
'url': music_url,
'uploader': uploader,
'ext': determine_ext(music_url),
}

120
youtube_dl/extractor/nhl.py Normal file
View File

@@ -0,0 +1,120 @@
import re
import json
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
compat_urllib_parse,
determine_ext,
unified_strdate,
)
class NHLBaseInfoExtractor(InfoExtractor):
@staticmethod
def _fix_json(json_string):
return json_string.replace('\\\'', '\'')
def _extract_video(self, info):
video_id = info['id']
self.report_extraction(video_id)
initial_video_url = info['publishPoint']
data = compat_urllib_parse.urlencode({
'type': 'fvod',
'path': initial_video_url.replace('.mp4', '_sd.mp4'),
})
path_url = 'http://video.nhl.com/videocenter/servlets/encryptvideopath?' + data
path_response = self._download_webpage(path_url, video_id,
u'Downloading final video url')
path_doc = xml.etree.ElementTree.fromstring(path_response)
video_url = path_doc.find('path').text
join = compat_urlparse.urljoin
return {
'id': video_id,
'title': info['name'],
'url': video_url,
'ext': determine_ext(video_url),
'description': info['description'],
'duration': int(info['duration']),
'thumbnail': join(join(video_url, '/u/'), info['bigImage']),
'upload_date': unified_strdate(info['releaseDate'].split('.')[0]),
}
class NHLIE(NHLBaseInfoExtractor):
IE_NAME = u'nhl.com'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/console\?.*?(?<=[?&])id=(?P<id>\d+)'
_TEST = {
u'url': u'http://video.canucks.nhl.com/videocenter/console?catid=6?id=453614',
u'file': u'453614.mp4',
u'info_dict': {
u'title': u'Quick clip: Weise 4-3 goal vs Flames',
u'description': u'Dale Weise scores his first of the season to put the Canucks up 4-3.',
u'duration': 18,
u'upload_date': u'20131006',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
json_url = 'http://video.nhl.com/videocenter/servlets/playlist?ids=%s&format=json' % video_id
info_json = self._download_webpage(json_url, video_id,
u'Downloading info json')
info_json = self._fix_json(info_json)
info = json.loads(info_json)[0]
return self._extract_video(info)
class NHLVideocenterIE(NHLBaseInfoExtractor):
IE_NAME = u'nhl.com:videocenter'
IE_DESC = u'Download the first 12 videos from a videocenter category'
_VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?.*?catid=(?P<catid>[^&]+))?'
@classmethod
def suitable(cls, url):
if NHLIE.suitable(url):
return False
return super(NHLVideocenterIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
team = mobj.group('team')
webpage = self._download_webpage(url, team)
cat_id = self._search_regex(
[r'var defaultCatId = "(.+?)";',
r'{statusIndex:0,index:0,.*?id:(.*?),'],
webpage, u'category id')
playlist_title = self._html_search_regex(
r'\?catid=%s">(.*?)</a>' % cat_id,
webpage, u'playlist title', flags=re.DOTALL)
data = compat_urllib_parse.urlencode({
'cid': cat_id,
# This is the default value
'count': 12,
'ptrs': 3,
'format': 'json',
})
path = '/videocenter/servlets/browse?' + data
request_url = compat_urlparse.urljoin(url, path)
response = self._download_webpage(request_url, playlist_title)
response = self._fix_json(response)
if not response.strip():
self._downloader.report_warning(u'Got an empty reponse, trying '
u'adding the "newvideos" parameter')
response = self._download_webpage(request_url + '&newvideos=true',
playlist_title)
response = self._fix_json(response)
videos = json.loads(response)
return {
'_type': 'playlist',
'title': playlist_title,
'id': cat_id,
'entries': [self._extract_video(i) for i in videos],
}

View File

@@ -0,0 +1,43 @@
import re
from .common import InfoExtractor
from ..utils import compat_urlparse
class NowVideoIE(InfoExtractor):
_VALID_URL = r'(?:https?://)?(?:www\.)?nowvideo\.ch/video/(?P<id>\w+)'
_TEST = {
u'url': u'http://www.nowvideo.ch/video/0mw0yow7b6dxa',
u'file': u'0mw0yow7b6dxa.flv',
u'md5': u'f8fbbc8add72bd95b7850c6a02fc8817',
u'info_dict': {
u"title": u"youtubedl test video _BaW_jenozKc.mp4"
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage_url = 'http://www.nowvideo.ch/video/' + video_id
webpage = self._download_webpage(webpage_url, video_id)
self.report_extraction(video_id)
video_title = self._html_search_regex(r'<h4>(.*)</h4>',
webpage, u'video title')
video_key = self._search_regex(r'var fkzd="(.*)";',
webpage, u'video key')
api_call = "http://www.nowvideo.ch/api/player.api.php?file={0}&numOfErrors=0&cid=1&key={1}".format(video_id, video_key)
api_response = self._download_webpage(api_call, video_id,
u'Downloading API page')
video_url = compat_urlparse.parse_qs(api_response)[u'url'][0]
return [{
'id': video_id,
'url': video_url,
'ext': 'flv',
'title': video_title,
}]

View File

@@ -0,0 +1,58 @@
import re
import json
from .common import InfoExtractor
from ..utils import unescapeHTML
class OoyalaIE(InfoExtractor):
_VALID_URL = r'https?://.+?\.ooyala\.com/.*?embedCode=(?P<id>.+?)(&|$)'
_TEST = {
# From http://it.slashdot.org/story/13/04/25/178216/recovering-data-from-broken-hard-drives-and-ssds-video
u'url': u'http://player.ooyala.com/player.js?embedCode=pxczE2YjpfHfn1f3M-ykG_AmJRRn0PD8',
u'file': u'pxczE2YjpfHfn1f3M-ykG_AmJRRn0PD8.mp4',
u'md5': u'3f5cceb3a7bf461d6c29dc466cf8033c',
u'info_dict': {
u'title': u'Explaining Data Recovery from Hard Drives and SSDs',
u'description': u'How badly damaged does a drive have to be to defeat Russell and his crew? Apparently, smashed to bits.',
},
}
@staticmethod
def _url_for_embed_code(embed_code):
return 'http://player.ooyala.com/player.js?embedCode=%s' % embed_code
def _extract_result(self, info, more_info):
return {'id': info['embedCode'],
'ext': 'mp4',
'title': unescapeHTML(info['title']),
'url': info.get('ipad_url') or info['url'],
'description': unescapeHTML(more_info['description']),
'thumbnail': more_info['promo'],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
embedCode = mobj.group('id')
player_url = 'http://player.ooyala.com/player.js?embedCode=%s' % embedCode
player = self._download_webpage(player_url, embedCode)
mobile_url = self._search_regex(r'mobile_player_url="(.+?)&device="',
player, u'mobile player url')
mobile_player = self._download_webpage(mobile_url, embedCode)
videos_info = self._search_regex(
r'var streams=window.oo_testEnv\?\[\]:eval\("\((\[{.*?}\])\)"\);',
mobile_player, u'info').replace('\\"','"')
videos_more_info = self._search_regex(r'eval\("\(({.*?\\"promo\\".*?})\)"', mobile_player, u'more info').replace('\\"','"')
videos_info = json.loads(videos_info)
videos_more_info =json.loads(videos_more_info)
if videos_more_info.get('lineup'):
videos = [self._extract_result(info, more_info) for (info, more_info) in zip(videos_info, videos_more_info['lineup'])]
return {'_type': 'playlist',
'id': embedCode,
'title': unescapeHTML(videos_more_info['title']),
'entries': videos,
}
else:
return self._extract_result(videos_info[0], videos_more_info)

View File

@@ -0,0 +1,54 @@
# coding: utf-8
import re
import xml.etree.ElementTree
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
ExtractorError,
find_xpath_attr,
)
class ORFIE(InfoExtractor):
_VALID_URL = r'https?://tvthek.orf.at/(programs/.+?/episodes|topics/.+?)/(?P<id>\d+)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
flash_xml = self._search_regex('ORF.flashXML = \'(.+?)\'', webpage, u'flash xml')
flash_xml = compat_urlparse.parse_qs('xml='+flash_xml)['xml'][0]
flash_config = xml.etree.ElementTree.fromstring(flash_xml.encode('utf-8'))
playlist_json = self._search_regex(r'playlist\': \'(\[.*?\])\'', webpage, u'playlist').replace(r'\"','"')
playlist = json.loads(playlist_json)
videos = []
ns = '{http://tempuri.org/XMLSchema.xsd}'
xpath = '%(ns)sPlaylist/%(ns)sItems/%(ns)sItem' % {'ns': ns}
webpage_description = self._og_search_description(webpage)
for (i, (item, info)) in enumerate(zip(flash_config.findall(xpath), playlist), 1):
# Get best quality url
rtmp_url = None
for q in ['Q6A', 'Q4A', 'Q1A']:
video_url = find_xpath_attr(item, '%sVideoUrl' % ns, 'quality', q)
if video_url is not None:
rtmp_url = video_url.text
break
if rtmp_url is None:
raise ExtractorError(u'Couldn\'t get video url: %s' % info['id'])
description = self._html_search_regex(
r'id="playlist_entry_%s".*?<p>(.*?)</p>' % i, webpage,
u'description', default=webpage_description, flags=re.DOTALL)
videos.append({
'_type': 'video',
'id': info['id'],
'title': info['title'],
'url': rtmp_url,
'ext': 'flv',
'description': description,
})
return videos

View File

@@ -0,0 +1,34 @@
import re
import json
from .common import InfoExtractor
class PBSIE(InfoExtractor):
_VALID_URL = r'https?://video.pbs.org/video/(?P<id>\d+)/?'
_TEST = {
u'url': u'http://video.pbs.org/video/2365006249/',
u'file': u'2365006249.mp4',
u'md5': 'ce1888486f0908d555a8093cac9a7362',
u'info_dict': {
u'title': u'A More Perfect Union',
u'description': u'md5:ba0c207295339c8d6eced00b7c363c6a',
u'duration': 3190,
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
info_url = 'http://video.pbs.org/videoInfo/%s?format=json' % video_id
info_page = self._download_webpage(info_url, video_id)
info =json.loads(info_page)
return {'id': video_id,
'title': info['title'],
'url': info['alternate_encoding']['url'],
'ext': 'mp4',
'description': info['program'].get('description'),
'thumbnail': info.get('image_url'),
'duration': info.get('duration'),
}

View File

@@ -38,6 +38,7 @@ class PornotubeIE(InfoExtractor):
VIDEO_UPLOADED_RE = r'<div class="video_added_by">Added (?P<date>[0-9\/]+) by'
upload_date = self._html_search_regex(VIDEO_UPLOADED_RE, webpage, u'upload date', fatal=False)
if upload_date: upload_date = unified_strdate(upload_date)
age_limit = self._rta_search(webpage)
info = {'id': video_id,
'url': video_url,
@@ -45,6 +46,7 @@ class PornotubeIE(InfoExtractor):
'upload_date': upload_date,
'title': video_title,
'ext': 'flv',
'format': 'flv'}
'format': 'flv',
'age_limit': age_limit}
return [info]

View File

@@ -14,24 +14,30 @@ class RedTubeIE(InfoExtractor):
}
}
def _real_extract(self,url):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_extension = 'mp4'
video_extension = 'mp4'
webpage = self._download_webpage(url, video_id)
self.report_extraction(video_id)
video_url = self._html_search_regex(r'<source src="(.+?)" type="video/mp4">',
webpage, u'video URL')
video_url = self._html_search_regex(
r'<source src="(.+?)" type="video/mp4">', webpage, u'video URL')
video_title = self._html_search_regex('<h1 class="videoTitle slidePanelMovable">(.+?)</h1>',
video_title = self._html_search_regex(
r'<h1 class="videoTitle slidePanelMovable">(.+?)</h1>',
webpage, u'title')
return [{
'id': video_id,
'url': video_url,
'ext': video_extension,
'title': video_title,
}]
# No self-labeling, but they describe themselves as
# "Home of Videos Porno"
age_limit = 18
return {
'id': video_id,
'url': video_url,
'ext': video_extension,
'title': video_title,
'age_limit': age_limit,
}

View File

@@ -0,0 +1,42 @@
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
compat_parse_qs,
)
class Ro220IE(InfoExtractor):
IE_NAME = '220.ro'
_VALID_URL = r'(?x)(?:https?://)?(?:www\.)?220\.ro/(?P<category>[^/]+)/(?P<shorttitle>[^/]+)/(?P<video_id>[^/]+)'
_TEST = {
u"url": u"http://www.220.ro/sport/Luati-Le-Banii-Sez-4-Ep-1/LYV6doKo7f/",
u'file': u'LYV6doKo7f.mp4',
u'md5': u'03af18b73a07b4088753930db7a34add',
u'info_dict': {
u"title": u"Luati-le Banii sez 4 ep 1",
u"description": u"Iata-ne reveniti dupa o binemeritata vacanta. Va astept si pe Facebook cu pareri si comentarii.",
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('video_id')
webpage = self._download_webpage(url, video_id)
flashVars_str = self._search_regex(
r'<param name="flashVars" value="([^"]+)"',
webpage, u'flashVars')
flashVars = compat_parse_qs(flashVars_str)
info = {
'_type': 'video',
'id': video_id,
'ext': 'mp4',
'url': flashVars['videoURL'][0],
'title': flashVars['title'][0],
'description': clean_html(flashVars['desc'][0]),
'thumbnail': flashVars['preview'][0],
}
return info

View File

@@ -0,0 +1,16 @@
from .videodetective import VideoDetectiveIE
# It just uses the same method as videodetective.com,
# the internetvideoarchive.com is extracted from the og:video property
class RottenTomatoesIE(VideoDetectiveIE):
_VALID_URL = r'https?://www\.rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
_TEST = {
u'url': u'http://www.rottentomatoes.com/m/toy_story_3/trailers/11028566/',
u'file': '613340.mp4',
u'info_dict': {
u'title': u'TOY STORY 3',
u'description': u'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
},
}

View File

@@ -0,0 +1,49 @@
import re
import json
from .common import InfoExtractor
from ..utils import unified_strdate, determine_ext
class RoxwelIE(InfoExtractor):
_VALID_URL = r'https?://www\.roxwel\.com/player/(?P<filename>.+?)(\.|\?|$)'
_TEST = {
u'url': u'http://www.roxwel.com/player/passionpittakeawalklive.html',
u'file': u'passionpittakeawalklive.flv',
u'md5': u'd9dea8360a1e7d485d2206db7fe13035',
u'info_dict': {
u'title': u'Take A Walk (live)',
u'uploader': u'Passion Pit',
u'description': u'Passion Pit performs "Take A Walk\" live at The Backyard in Austin, Texas. ',
},
u'skip': u'Requires rtmpdump',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
filename = mobj.group('filename')
info_url = 'http://www.roxwel.com/api/videos/%s' % filename
info_page = self._download_webpage(info_url, filename,
u'Downloading video info')
self.report_extraction(filename)
info = json.loads(info_page)
rtmp_rates = sorted([int(r.replace('flv_', '')) for r in info['media_rates'] if r.startswith('flv_')])
best_rate = rtmp_rates[-1]
url_page_url = 'http://roxwel.com/pl_one_time.php?filename=%s&quality=%s' % (filename, best_rate)
rtmp_url = self._download_webpage(url_page_url, filename, u'Downloading video url')
ext = determine_ext(rtmp_url)
if ext == 'f4v':
rtmp_url = rtmp_url.replace(filename, 'mp4:%s' % filename)
return {'id': filename,
'title': info['title'],
'url': rtmp_url,
'ext': 'flv',
'description': info['description'],
'thumbnail': info.get('player_image_url') or info.get('image_url_large'),
'uploader': info['artist'],
'uploader_id': info['artistname'],
'upload_date': unified_strdate(info['dbdate']),
}

View File

@@ -0,0 +1,156 @@
# encoding: utf-8
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
ExtractorError,
)
class RTLnowIE(InfoExtractor):
"""Information Extractor for RTL NOW, RTL2 NOW, RTL NITRO, SUPER RTL NOW, VOX NOW and n-tv NOW"""
_VALID_URL = r'(?:http://)?(?P<url>(?P<base_url>rtl-now\.rtl\.de/|rtl2now\.rtl2\.de/|(?:www\.)?voxnow\.de/|(?:www\.)?rtlnitronow\.de/|(?:www\.)?superrtlnow\.de/|(?:www\.)?n-tvnow\.de/)[a-zA-Z0-9-]+/[a-zA-Z0-9-]+\.php\?(?:container_id|film_id)=(?P<video_id>[0-9]+)&player=1(?:&season=[0-9]+)?(?:&.*)?)'
_TESTS = [{
u'url': u'http://rtl-now.rtl.de/ahornallee/folge-1.php?film_id=90419&player=1&season=1',
u'file': u'90419.flv',
u'info_dict': {
u'upload_date': u'20070416',
u'title': u'Ahornallee - Folge 1 - Der Einzug',
u'description': u'Folge 1 - Der Einzug',
},
u'params': {
u'skip_download': True,
},
u'skip': u'Only works from Germany',
},
{
u'url': u'http://rtl2now.rtl2.de/aerger-im-revier/episode-15-teil-1.php?film_id=69756&player=1&season=2&index=5',
u'file': u'69756.flv',
u'info_dict': {
u'upload_date': u'20120519',
u'title': u'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit...',
u'description': u'Ärger im Revier - Ein junger Ladendieb, ein handfester Streit u.a.',
u'thumbnail': u'http://autoimg.static-fra.de/rtl2now/219850/1500x1500/image2.jpg',
},
u'params': {
u'skip_download': True,
},
u'skip': u'Only works from Germany',
},
{
u'url': u'www.voxnow.de/voxtours/suedafrika-reporter-ii.php?film_id=13883&player=1&season=17',
u'file': u'13883.flv',
u'info_dict': {
u'upload_date': u'20090627',
u'title': u'Voxtours - Südafrika-Reporter II',
u'description': u'Südafrika-Reporter II',
},
u'params': {
u'skip_download': True,
},
},
{
u'url': u'http://superrtlnow.de/medicopter-117/angst.php?film_id=99205&player=1',
u'file': u'99205.flv',
u'info_dict': {
u'upload_date': u'20080928',
u'title': u'Medicopter 117 - Angst!',
u'description': u'Angst!',
u'thumbnail': u'http://autoimg.static-fra.de/superrtlnow/287529/1500x1500/image2.jpg'
},
u'params': {
u'skip_download': True,
},
},
{
u'url': u'http://www.rtlnitronow.de/recht-ordnung/lebensmittelkontrolle-erlangenordnungsamt-berlin.php?film_id=127367&player=1&season=1',
u'file': u'127367.flv',
u'info_dict': {
u'upload_date': u'20130926',
u'title': u'Recht & Ordnung - Lebensmittelkontrolle Erlangen/Ordnungsamt...',
u'description': u'Lebensmittelkontrolle Erlangen/Ordnungsamt Berlin',
u'thumbnail': u'http://autoimg.static-fra.de/nitronow/344787/1500x1500/image2.jpg',
},
u'params': {
u'skip_download': True,
},
},
{
u'url': u'http://www.n-tvnow.de/top-gear/episode-1-2013-01-01-00-00-00.php?film_id=124903&player=1&season=10',
u'file': u'124903.flv',
u'info_dict': {
u'upload_date': u'20130101',
u'title': u'Top Gear vom 01.01.2013',
u'description': u'Episode 1',
},
u'params': {
u'skip_download': True,
},
u'skip': u'Only works from Germany',
}]
def _real_extract(self,url):
mobj = re.match(self._VALID_URL, url)
webpage_url = u'http://' + mobj.group('url')
video_page_url = u'http://' + mobj.group('base_url')
video_id = mobj.group(u'video_id')
webpage = self._download_webpage(webpage_url, video_id)
note_m = re.search(r'''(?sx)
<div[ ]style="margin-left:[ ]20px;[ ]font-size:[ ]13px;">(.*?)
<div[ ]id="playerteaser">''', webpage)
if note_m:
msg = clean_html(note_m.group(1))
raise ExtractorError(msg)
video_title = self._html_search_regex(r'<title>(?P<title>[^<]+?)( \| [^<]*)?</title>',
webpage, u'title')
playerdata_url = self._html_search_regex(r'\'playerdata\': \'(?P<playerdata_url>[^\']+)\'',
webpage, u'playerdata_url')
playerdata = self._download_webpage(playerdata_url, video_id)
mobj = re.search(r'<title><!\[CDATA\[(?P<description>.+?)(?:\s+- (?:Sendung )?vom (?P<upload_date_d>[0-9]{2})\.(?P<upload_date_m>[0-9]{2})\.(?:(?P<upload_date_Y>[0-9]{4})|(?P<upload_date_y>[0-9]{2})) [0-9]{2}:[0-9]{2} Uhr)?\]\]></title>', playerdata)
if mobj:
video_description = mobj.group(u'description')
if mobj.group('upload_date_Y'):
video_upload_date = mobj.group('upload_date_Y')
elif mobj.group('upload_date_y'):
video_upload_date = u'20' + mobj.group('upload_date_y')
else:
video_upload_date = None
if video_upload_date:
video_upload_date += mobj.group('upload_date_m')+mobj.group('upload_date_d')
else:
video_description = None
video_upload_date = None
self._downloader.report_warning(u'Unable to extract description and upload date')
# Thumbnail: not every video has an thumbnail
mobj = re.search(r'<meta property="og:image" content="(?P<thumbnail>[^"]+)">', webpage)
if mobj:
video_thumbnail = mobj.group(u'thumbnail')
else:
video_thumbnail = None
mobj = re.search(r'<filename [^>]+><!\[CDATA\[(?P<url>rtmpe://(?:[^/]+/){2})(?P<play_path>[^\]]+)\]\]></filename>', playerdata)
if mobj is None:
raise ExtractorError(u'Unable to extract media URL')
video_url = mobj.group(u'url')
video_play_path = u'mp4:' + mobj.group(u'play_path')
video_player_url = video_page_url + u'includes/vodplayer.swf'
return [{
'id': video_id,
'url': video_url,
'play_path': video_play_path,
'page_url': video_page_url,
'player_url': video_player_url,
'ext': 'flv',
'title': video_title,
'description': video_description,
'upload_date': video_upload_date,
'thumbnail': video_thumbnail,
}]

View File

@@ -0,0 +1,23 @@
import re
from .common import InfoExtractor
class SlashdotIE(InfoExtractor):
_VALID_URL = r'https?://tv.slashdot.org/video/\?embed=(?P<id>.*?)(&|$)'
_TEST = {
u'url': u'http://tv.slashdot.org/video/?embed=JscHMzZDplD0p-yNLOzTfzC3Q3xzJaUz',
u'file': u'JscHMzZDplD0p-yNLOzTfzC3Q3xzJaUz.mp4',
u'md5': u'd2222e7a4a4c1541b3e0cf732fb26735',
u'info_dict': {
u'title': u' Meet the Stampede Supercomputing Cluster\'s Administrator',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
ooyala_url = self._search_regex(r'<script src="(.*?)"', webpage, 'ooyala url')
return self.url_result(ooyala_url, 'Ooyala')

View File

@@ -0,0 +1,47 @@
import re
import json
from .common import InfoExtractor
from ..utils import (
compat_urlparse,
ExtractorError,
)
class SlideshareIE(InfoExtractor):
_VALID_URL = r'https?://www\.slideshare\.net/[^/]+?/(?P<title>.+?)($|\?)'
_TEST = {
u'url': u'http://www.slideshare.net/Dataversity/keynote-presentation-managing-scale-and-complexity',
u'file': u'25665706.mp4',
u'info_dict': {
u'title': u'Managing Scale and Complexity',
u'description': u'This was a keynote presentation at the NoSQL Now! 2013 Conference & Expo (http://www.nosqlnow.com). This presentation was given by Adrian Cockcroft from Netflix',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
page_title = mobj.group('title')
webpage = self._download_webpage(url, page_title)
slideshare_obj = self._search_regex(
r'var slideshare_object = ({.*?}); var user_info =',
webpage, u'slideshare object')
info = json.loads(slideshare_obj)
if info['slideshow']['type'] != u'video':
raise ExtractorError(u'Webpage type is "%s": only video extraction is supported for Slideshare' % info['slideshow']['type'], expected=True)
doc = info['doc']
bucket = info['jsplayer']['video_bucket']
ext = info['jsplayer']['video_extension']
video_url = compat_urlparse.urljoin(bucket, doc + '-SD.' + ext)
return {
'_type': 'video',
'id': info['slideshow']['id'],
'title': info['slideshow']['title'],
'ext': ext,
'url': video_url,
'thumbnail': info['slideshow']['pin_image_url'],
'description': self._og_search_description(webpage),
}

View File

@@ -0,0 +1,94 @@
# encoding: utf-8
import json
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class SohuIE(InfoExtractor):
_VALID_URL = r'https?://(?P<mytv>my\.)?tv\.sohu\.com/.+?/(?(mytv)|n)(?P<id>\d+)\.shtml.*?'
_TEST = {
u'url': u'http://tv.sohu.com/20130724/n382479172.shtml#super',
u'file': u'382479172.mp4',
u'md5': u'bde8d9a6ffd82c63a1eefaef4eeefec7',
u'info_dict': {
u'title': u'MVFar East Movement《The Illest》',
},
}
def _real_extract(self, url):
def _fetch_data(vid_id, mytv=False):
if mytv:
base_data_url = 'http://my.tv.sohu.com/play/videonew.do?vid='
else:
base_data_url = u'http://hot.vrs.sohu.com/vrs_flash.action?vid='
data_url = base_data_url + str(vid_id)
data_json = self._download_webpage(
data_url, video_id,
note=u'Downloading JSON data for ' + str(vid_id))
return json.loads(data_json)
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
mytv = mobj.group('mytv') is not None
webpage = self._download_webpage(url, video_id)
raw_title = self._html_search_regex(r'(?s)<title>(.+?)</title>',
webpage, u'video title')
title = raw_title.partition('-')[0].strip()
vid = self._html_search_regex(r'var vid ?= ?["\'](\d+)["\']', webpage,
u'video path')
data = _fetch_data(vid, mytv)
QUALITIES = ('ori', 'super', 'high', 'nor')
vid_ids = [data['data'][q + 'Vid']
for q in QUALITIES
if data['data'][q + 'Vid'] != 0]
if not vid_ids:
raise ExtractorError(u'No formats available for this video')
# For now, we just pick the highest available quality
vid_id = vid_ids[-1]
format_data = data if vid == vid_id else _fetch_data(vid_id, mytv)
part_count = format_data['data']['totalBlocks']
allot = format_data['allot']
prot = format_data['prot']
clipsURL = format_data['data']['clipsURL']
su = format_data['data']['su']
playlist = []
for i in range(part_count):
part_url = ('http://%s/?prot=%s&file=%s&new=%s' %
(allot, prot, clipsURL[i], su[i]))
part_str = self._download_webpage(
part_url, video_id,
note=u'Downloading part %d of %d' % (i+1, part_count))
part_info = part_str.split('|')
video_url = '%s%s?key=%s' % (part_info[0], su[i], part_info[3])
video_info = {
'id': '%s_part%02d' % (video_id, i + 1),
'title': title,
'url': video_url,
'ext': 'mp4',
}
playlist.append(video_info)
if len(playlist) == 1:
info = playlist[0]
info['id'] = video_id
else:
info = {
'_type': 'playlist',
'entries': playlist,
'id': video_id,
}
return info

View File

@@ -1,9 +1,12 @@
import json
import re
import itertools
from .common import InfoExtractor
from ..utils import (
compat_str,
compat_urlparse,
compat_urllib_parse,
ExtractorError,
unified_strdate,
@@ -19,7 +22,12 @@ class SoundcloudIE(InfoExtractor):
of the stream token and uid
"""
_VALID_URL = r'^(?:https?://)?(?:www\.)?soundcloud\.com/([\w\d-]+)/([\w\d-]+)/?(?:[?].*)?$'
_VALID_URL = r'''^(?:https?://)?
(?:(?:(?:www\.)?soundcloud\.com/([\w\d-]+)/([\w\d-]+)/?(?:[?].*)?$)
|(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+))
|(?P<widget>w.soundcloud.com/player/?.*?url=.*)
)
'''
IE_NAME = u'soundcloud'
_TEST = {
u'url': u'http://soundcloud.com/ethmusic/lostin-powers-she-so-heavy',
@@ -33,59 +41,69 @@ class SoundcloudIE(InfoExtractor):
}
}
_CLIENT_ID = 'b45b1aa10f1ac2941910a7f0d10f8e28'
@classmethod
def suitable(cls, url):
return re.match(cls._VALID_URL, url, flags=re.VERBOSE) is not None
def report_resolve(self, video_id):
"""Report information extraction."""
self.to_screen(u'%s: Resolving id' % video_id)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
@classmethod
def _resolv_url(cls, url):
return 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=' + cls._CLIENT_ID
# extract uploader (which is in the url)
uploader = mobj.group(1)
# extract simple title (uploader + slug of song title)
slug_title = mobj.group(2)
full_title = '%s/%s' % (uploader, slug_title)
self.report_resolve(full_title)
url = 'http://soundcloud.com/%s/%s' % (uploader, slug_title)
resolv_url = 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=b45b1aa10f1ac2941910a7f0d10f8e28'
info_json = self._download_webpage(resolv_url, full_title, u'Downloading info JSON')
info = json.loads(info_json)
def _extract_info_dict(self, info, full_title=None, quiet=False):
video_id = info['id']
self.report_extraction(full_title)
name = full_title or video_id
if quiet == False:
self.report_extraction(name)
streams_url = 'https://api.sndcdn.com/i1/tracks/' + str(video_id) + '/streams?client_id=b45b1aa10f1ac2941910a7f0d10f8e28'
stream_json = self._download_webpage(streams_url, full_title,
u'Downloading stream definitions',
u'unable to download stream definitions')
streams = json.loads(stream_json)
mediaURL = streams['http_mp3_128_url']
upload_date = unified_strdate(info['created_at'])
return [{
thumbnail = info['artwork_url']
if thumbnail is not None:
thumbnail = thumbnail.replace('-large', '-t500x500')
return {
'id': info['id'],
'url': mediaURL,
'url': info['stream_url'] + '?client_id=' + self._CLIENT_ID,
'uploader': info['user']['username'],
'upload_date': upload_date,
'upload_date': unified_strdate(info['created_at']),
'title': info['title'],
'ext': u'mp3',
'description': info['description'],
}]
'thumbnail': thumbnail,
}
class SoundcloudSetIE(InfoExtractor):
"""Information extractor for soundcloud.com sets
To access the media, the uid of the song and a stream token
must be extracted from the page source and the script must make
a request to media.soundcloud.com/crossdomain.xml. Then
the media can be grabbed by requesting from an url composed
of the stream token and uid
"""
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url, flags=re.VERBOSE)
if mobj is None:
raise ExtractorError(u'Invalid URL: %s' % url)
track_id = mobj.group('track_id')
if track_id is not None:
info_json_url = 'http://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
full_title = track_id
elif mobj.group('widget'):
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
return self.url_result(query['url'][0], ie='Soundcloud')
else:
# extract uploader (which is in the url)
uploader = mobj.group(1)
# extract simple title (uploader + slug of song title)
slug_title = mobj.group(2)
full_title = '%s/%s' % (uploader, slug_title)
self.report_resolve(full_title)
url = 'http://soundcloud.com/%s/%s' % (uploader, slug_title)
info_json_url = self._resolv_url(url)
info_json = self._download_webpage(info_json_url, full_title, u'Downloading info JSON')
info = json.loads(info_json)
return self._extract_info_dict(info, full_title)
class SoundcloudSetIE(SoundcloudIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?soundcloud\.com/([\w\d-]+)/sets/([\w\d-]+)(?:[?].*)?$'
IE_NAME = u'soundcloud:set'
_TEST = {
@@ -153,10 +171,6 @@ class SoundcloudSetIE(InfoExtractor):
]
}
def report_resolve(self, video_id):
"""Report information extraction."""
self.to_screen(u'%s: Resolving id' % video_id)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
if mobj is None:
@@ -171,7 +185,7 @@ class SoundcloudSetIE(InfoExtractor):
self.report_resolve(full_title)
url = 'http://soundcloud.com/%s/sets/%s' % (uploader, slug_title)
resolv_url = 'http://api.soundcloud.com/resolve.json?url=' + url + '&client_id=b45b1aa10f1ac2941910a7f0d10f8e28'
resolv_url = self._resolv_url(url)
info_json = self._download_webpage(resolv_url, full_title)
videos = []
@@ -182,23 +196,46 @@ class SoundcloudSetIE(InfoExtractor):
return
self.report_extraction(full_title)
for track in info['tracks']:
video_id = track['id']
return {'_type': 'playlist',
'entries': [self._extract_info_dict(track) for track in info['tracks']],
'id': info['id'],
'title': info['title'],
}
streams_url = 'https://api.sndcdn.com/i1/tracks/' + str(video_id) + '/streams?client_id=b45b1aa10f1ac2941910a7f0d10f8e28'
stream_json = self._download_webpage(streams_url, video_id, u'Downloading track info JSON')
self.report_extraction(video_id)
streams = json.loads(stream_json)
mediaURL = streams['http_mp3_128_url']
class SoundcloudUserIE(SoundcloudIE):
_VALID_URL = r'https?://(www\.)?soundcloud.com/(?P<user>[^/]+)(/?(tracks/)?)?(\?.*)?$'
IE_NAME = u'soundcloud:user'
videos.append({
'id': video_id,
'url': mediaURL,
'uploader': track['user']['username'],
'upload_date': unified_strdate(track['created_at']),
'title': track['title'],
'ext': u'mp3',
'description': track['description'],
})
return videos
# it's in tests/test_playlists.py
_TEST = None
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
uploader = mobj.group('user')
url = 'http://soundcloud.com/%s/' % uploader
resolv_url = self._resolv_url(url)
user_json = self._download_webpage(resolv_url, uploader,
u'Downloading user info')
user = json.loads(user_json)
tracks = []
for i in itertools.count():
data = compat_urllib_parse.urlencode({'offset': i*50,
'client_id': self._CLIENT_ID,
})
tracks_url = 'http://api.soundcloud.com/users/%s/tracks.json?' % user['id'] + data
response = self._download_webpage(tracks_url, uploader,
u'Downloading tracks page %s' % (i+1))
new_tracks = json.loads(response)
tracks.extend(self._extract_info_dict(track, quiet=True) for track in new_tracks)
if len(new_tracks) < 50:
break
return {
'_type': 'playlist',
'id': compat_str(user['id']),
'title': user['username'],
'entries': tracks,
}

View File

@@ -0,0 +1,38 @@
import re
from .mtv import MTVIE, _media_xml_tag
class SouthParkStudiosIE(MTVIE):
IE_NAME = u'southparkstudios.com'
_VALID_URL = r'https?://www\.southparkstudios\.com/(clips|full-episodes)/(?P<id>.+?)(\?|#|$)'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
_TEST = {
u'url': u'http://www.southparkstudios.com/clips/104437/bat-daded#tab=featured',
u'file': u'a7bff6c2-ed00-11e0-aca6-0026b9414f30.mp4',
u'info_dict': {
u'title': u'Bat Daded',
u'description': u'Randy disqualifies South Park by getting into a fight with Bat Dad.',
},
}
# Overwrite MTVIE properties we don't want
_TESTS = []
def _get_thumbnail_url(self, uri, itemdoc):
search_path = '%s/%s' % (_media_xml_tag('group'), _media_xml_tag('thumbnail'))
thumb_node = itemdoc.find(search_path)
if thumb_node is None:
return None
else:
return thumb_node.attrib['url']
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
mgid = self._search_regex(r'swfobject.embedSWF\(".*?(mgid:.*?)"',
webpage, u'mgid')
return self._get_videos_info(mgid)

View File

@@ -5,13 +5,13 @@ from .common import InfoExtractor
class StatigramIE(InfoExtractor):
_VALID_URL = r'(?:http://)?(?:www\.)?statigr\.am/p/([^/]+)'
_TEST = {
u'url': u'http://statigr.am/p/484091715184808010_284179915',
u'file': u'484091715184808010_284179915.mp4',
u'md5': u'deda4ff333abe2e118740321e992605b',
u'url': u'http://statigr.am/p/522207370455279102_24101272',
u'file': u'522207370455279102_24101272.mp4',
u'md5': u'6eb93b882a3ded7c378ee1d6884b1814',
u'info_dict': {
u"uploader_id": u"videoseconds",
u"title": u"Instagram photo by @videoseconds"
}
u'uploader_id': u'aguynamedpatrick',
u'title': u'Instagram photo by @aguynamedpatrick (Patrick Janelle)',
},
}
def _real_extract(self, url):

View File

@@ -0,0 +1,91 @@
from .common import InfoExtractor
from ..utils import (
compat_str,
ExtractorError,
)
class SubtitlesInfoExtractor(InfoExtractor):
@property
def _have_to_download_any_subtitles(self):
return any([self._downloader.params.get('writesubtitles', False),
self._downloader.params.get('writeautomaticsub')])
def _list_available_subtitles(self, video_id, webpage=None):
""" outputs the available subtitles for the video """
sub_lang_list = self._get_available_subtitles(video_id)
auto_captions_list = self._get_available_automatic_caption(video_id, webpage)
sub_lang = ",".join(list(sub_lang_list.keys()))
self.to_screen(u'%s: Available subtitles for video: %s' %
(video_id, sub_lang))
auto_lang = ",".join(auto_captions_list.keys())
self.to_screen(u'%s: Available automatic captions for video: %s' %
(video_id, auto_lang))
def extract_subtitles(self, video_id, video_webpage=None):
"""
returns {sub_lang: sub} ,{} if subtitles not found or None if the
subtitles aren't requested.
"""
if not self._have_to_download_any_subtitles:
return None
available_subs_list = {}
if self._downloader.params.get('writeautomaticsub', False):
available_subs_list.update(self._get_available_automatic_caption(video_id, video_webpage))
if self._downloader.params.get('writesubtitles', False):
available_subs_list.update(self._get_available_subtitles(video_id))
if not available_subs_list: # error, it didn't get the available subtitles
return {}
if self._downloader.params.get('allsubtitles', False):
sub_lang_list = available_subs_list
else:
if self._downloader.params.get('subtitleslangs', False):
requested_langs = self._downloader.params.get('subtitleslangs')
elif 'en' in available_subs_list:
requested_langs = ['en']
else:
requested_langs = [list(available_subs_list.keys())[0]]
sub_lang_list = {}
for sub_lang in requested_langs:
if not sub_lang in available_subs_list:
self._downloader.report_warning(u'no closed captions found in the specified language "%s"' % sub_lang)
continue
sub_lang_list[sub_lang] = available_subs_list[sub_lang]
subtitles = {}
for sub_lang, url in sub_lang_list.items():
subtitle = self._request_subtitle_url(sub_lang, url)
if subtitle:
subtitles[sub_lang] = subtitle
return subtitles
def _request_subtitle_url(self, sub_lang, url):
""" makes the http request for the subtitle """
try:
sub = self._download_webpage(url, None, note=False)
except ExtractorError as err:
self._downloader.report_warning(u'unable to download video subtitles for %s: %s' % (sub_lang, compat_str(err)))
return
if not sub:
self._downloader.report_warning(u'Did not fetch video subtitles')
return
return sub
def _get_available_subtitles(self, video_id):
"""
returns {sub_lang: url} or {} if not available
Must be redefined by the subclasses
"""
pass
def _get_available_automatic_caption(self, video_id, webpage):
"""
returns {sub_lang: url} or {} if not available
Must be redefined by the subclasses that support automatic captions,
otherwise it will return {}
"""
self._downloader.report_warning(u'Automatic Captions not supported by this server')
return {}

Some files were not shown because too many files have changed in this diff Show More