Compare commits

...

257 Commits

Author SHA1 Message Date
Philipp Hagemeister
c41cf65d4a release 2016.04.06 2016-04-06 15:13:08 +02:00
Jaime Marquínez Ferrándiz
ec4a4c6fcc Makefile: remove ISSUE_TEMPLATE.md from the 'all' target (fixes #9088)
It isn't included in the tar file, causing build failures.
Since it's only used for GitHub, I think we don't need to store it in the tar file.
2016-04-06 14:16:05 +02:00
Jaime Marquínez Ferrándiz
be0c7009fb Makefile: use full path for the ISSUE_TEMPLATE.md file 2016-04-06 14:09:31 +02:00
Yen Chi Hsuan
92d5477d84 [compat] Handle tuples properly in urlencode()
Fixes #9055
2016-04-06 18:29:54 +08:00
Yen Chi Hsuan
8790249c68 [iqiyi] Improve error detection for VIP-only videos
Closes #9071
2016-04-06 16:12:16 +08:00
Philipp Hagemeister
416930d450 release 2016.04.05 2016-04-05 18:36:24 +02:00
Sergey M․
65150b41bb [deezer] Fix extraction (Closes #9086) 2016-04-05 22:27:33 +06:00
Sergey M․
e42f413716 [rte] Improve thumbnail extraction (Closes #9085) 2016-04-05 22:23:20 +06:00
Sergey M․
40a056d85d [extractor/__init__] Remove novamov extractor and sort novamov based extractors alphabetically 2016-04-05 21:54:09 +06:00
Sergey M․
e7d77efb9d [auroravid] Add extractor (Closes #9070) 2016-04-05 21:52:07 +06:00
Sergey M․
995cf05c96 [novamov] Make title fatal 2016-04-05 21:40:43 +06:00
Jaime Marquínez Ferrándiz
5bf28d7864 [utils] dfxp2srt: add additional namespace
Used by the ZDF subtitles (#9081).
2016-04-04 20:46:35 +02:00
Jaime Marquínez Ferrándiz
8c7d6e8e22 [zdf] Extract subtitles (closes #9081) 2016-04-04 20:44:06 +02:00
Sergey M․
6d4fc66bfc [youtube] Add support for zwearz (Closes #9062) 2016-04-04 02:26:20 +06:00
remitamine
23576edbfc [brightcove:legacy] skip None value for uploader_id 2016-04-02 21:31:21 +01:00
remitamine
4d4cd35f48 [brightcove:legacy] extract uploader_id as a string 2016-04-02 20:55:44 +01:00
remitamine
3aac9b2fb1 [nowness] update tests 2016-04-02 18:57:15 +01:00
remitamine
e47d19e991 [brightcove:new] extract subtitles and strip video title 2016-04-02 18:57:15 +01:00
remitamine
41f5492fbc [brightcove:legacy] improve format extraction and extract uploader_id, duration and timestamp 2016-04-02 18:57:15 +01:00
Jaime Marquínez Ferrándiz
2defa7d75a [instagram:user] Fix extraction (fixes #9059)
The URL for the next page was incorrect and we always got the same page, therefore it got trapped in an infinite loop.
2016-04-02 18:03:56 +02:00
Sergey M․
bbc26c8a01 [bbc] Set vcodec to none for audio formats 2016-04-02 19:00:38 +06:00
Sergey M․
b507cc925b [extractor/common] Carry long line 2016-04-02 18:49:58 +06:00
Sergey M․
db8ee7ec05 [extractor/common] Fix numeric identifiers conversion in DASH URL templates 2016-04-02 18:48:05 +06:00
remitamine
08136dc138 [brightcove] fix format sorting 2016-04-02 10:57:57 +01:00
remitamine
fe7ef95e91 [cbsinteractive] Add support for ZDNet videos 2016-04-01 23:53:32 +01:00
remitamine
5f705baf5e [cnet] extract more formats 2016-04-01 20:42:15 +01:00
remitamine
0750b2491f [ffmpeg] try to convert tt subtitles usng dfxp2srt 2016-04-01 19:47:49 +01:00
remitamine
df634be2ed [common] prefer using mime type over ext for smil subtitle extraction
the subtitle ext for http://www.cnet.com/videos/download-amazon-prime-movies-and-tv/
is adb_xml while using the mime type it get tt(application/smptett+xml)
2016-04-01 19:47:49 +01:00
Jaime Marquínez Ferrándiz
6d628fafca [camwithher] Remove extra blank line 2016-04-01 20:45:21 +02:00
Jaime Marquínez Ferrándiz
0f28777f58 [cbsnews] Remove unused import 2016-04-01 20:43:14 +02:00
Jaime Marquínez Ferrándiz
329c1eae54 [aenetworks] Make pep8 happy 2016-04-01 20:42:19 +02:00
Sergey M․
9aaaf8e8e8 [camwithher] Improve extraction (Closes #8989) 2016-04-01 23:47:27 +06:00
theGeekPirate
04819db58e [camwithher] Add extractor
Corrected unnecessary test

Sane variable naming

RTMP all .flv & url_id for _download_webpage()

Corrected all outstanding issues, next up is a squash!
2016-04-01 23:44:25 +06:00
remitamine
79ba9140dc [theplatform] extract timestamp and uploader 2016-04-01 18:07:17 +01:00
Sergey M․
75d572e9fb [screencast] Improve title regexes (Closes #9025) 2016-04-01 23:01:55 +06:00
Martin Trigaux
791d6aaecc screencast.com: fallback on page title
When determining the title of the page, use the <title> tag of the page
2016-04-01 23:00:52 +06:00
Sergey M․
81de73e5b4 [screencast] Add test 2016-04-01 23:00:45 +06:00
Martin Trigaux
83cedc1cf2 screencast.com: support missing www
The "www." part of the URL is not mandatory
2016-04-01 22:58:16 +06:00
Sergey M․
244cd04237 [pluralsight] Remove unnecessary login/password encode 2016-04-01 22:46:46 +06:00
Sergey M․
fbdaced256 [lynda] Remove unnecessary login/password encode 2016-04-01 22:45:20 +06:00
Sergey M․
a3373823e1 [udemy] Remove unnecessary login/password encode
This is now covered by compat_urllib_parse_urlencode
2016-04-01 22:42:09 +06:00
Sergey M․
03caa463e7 [udemy:course] Skip non-video lectures 2016-04-01 22:38:56 +06:00
remitamine
3f64379eda [movieclips] fix extraction 2016-04-01 16:22:06 +01:00
remitamine
3e0c3d14d9 [cbs] add base extractor 2016-04-01 10:12:29 +01:00
remitamine
d8873d4def [aenetworks] improve format extraction 2016-04-01 09:58:02 +01:00
remitamine
db1c969da5 [theplatform] sign https urls 2016-04-01 09:58:02 +01:00
Philipp Hagemeister
1e02bc7ba2 release 2016.04.01 2016-04-01 09:07:40 +02:00
remitamine
63c55e9f22 [cbs] improve extraction(closes #6321) 2016-04-01 07:33:37 +01:00
remitamine
f9b1529af8 [generic] remove sbnation test(handled by VoxMediaIE) 2016-03-31 23:50:45 +01:00
remitamine
961fc024d2 [voxmedia] improve sbnation support 2016-03-31 23:33:36 +01:00
Sergey M․
b53a06e3b9 [udemy:course] Use new URL format 2016-04-01 02:24:22 +06:00
remitamine
4ecc1fc638 [howstuffworks] improve extraction 2016-03-31 21:11:58 +01:00
Yen Chi Hsuan
5b012dfce8 [tudou] Improve error handling (closes #8988) 2016-04-01 01:42:16 +08:00
remitamine
8369942773 [voxmedia] Add new extractor(closes #3182) 2016-03-31 18:36:41 +01:00
Sergey M․
86f3b66cec [udemy] Remove unused import 2016-03-31 23:00:11 +06:00
Sergey M․
6bb4600717 [udemy:course] Simplify course curriculum downloading 2016-03-31 22:59:19 +06:00
Sergey M․
41d06b0424 [extractor/common] Improve _request_webpage
* Do not ignore data, headers and query for Requests
* Default values for headers and query switched to dicts since these are used by urllib itself
2016-03-31 22:58:38 +06:00
Sergey M․
15d260ebaa [utils] Use update_Request in http_request 2016-03-31 22:55:49 +06:00
Sergey M․
ed0291d153 [utils] Add update_Request 2016-03-31 22:55:01 +06:00
Sergey M․
81da8cbc45 [udemy] Switch to api 2.0 (Closes #9035) 2016-03-31 22:05:25 +06:00
Sergey M․
5299bc3f91 [beeg] Switch to api v6 (Closes #9036) 2016-03-31 20:42:41 +06:00
remitamine
c9c39c22c5 [nationalgeographic] add support for channel.nationalgeographic.com urls 2016-03-31 13:47:38 +01:00
remitamine
d84b48e3f1 [nationalgeographic] improve extraction 2016-03-31 13:44:55 +01:00
remitamine
dd17041c82 [tenplay] remove extractor(fixes #6927) 2016-03-31 12:02:04 +01:00
remitamine
fea7295b14 [brightcove] relax embed_in_page regex 2016-03-31 10:48:22 +01:00
remitamine
9cf01f7f30 [nbc] add new extractor for csnne.com(#5432) 2016-03-31 00:26:42 +01:00
remitamine
ce548296fe [cnbc] fix test 2016-03-31 00:25:11 +01:00
remitamine
c02ec7d430 [cnbc] Add new extractor(closes #8012) 2016-03-30 23:18:31 +01:00
remitamine
6b820a2376 [myspace] improve extraction 2016-03-30 21:18:07 +01:00
Yen Chi Hsuan
e621a344e6 [kwuo] Port to new API and enable --cn-verification-proxy 2016-03-31 02:27:52 +08:00
Yen Chi Hsuan
3ae6f8fec1 [kwuo] Remove _sort_formats() from KuwoBaseIE._get_formats()
Following the idea proposed in 19dbaeece3
2016-03-31 02:11:21 +08:00
Yen Chi Hsuan
597d52fadb [kuwo:song] Correct song ID extraction (fixes #9033)
Bug introduced in daef04a4e7.
2016-03-31 02:00:50 +08:00
Sergey M․
afca767d19 [tumblr] Improve _VALID_URL (Closes #9027) 2016-03-30 22:26:43 +06:00
remitamine
6e359a1534 [comcarcoff] don not depend on crackle extractor(closes #8995)
previously extraction has been delegated to crackle to extract more info
and subtitles #6106 but some of the episodes can't be extracted using
crackle #8995.
2016-03-30 12:27:00 +01:00
Sergey M․
607619bc90 Add manually generated ISSUE_TEMPLATE.md
In order not to wait for the next release
2016-03-29 22:04:29 +06:00
Sergey M․
0b7bfc9422 Improve ISSUE_TEMPLATE_tmpl.md 2016-03-29 22:02:42 +06:00
Sergey M․
7168a6c874 [devscripts/make_issue_template] Fix __version__ again 2016-03-29 03:05:15 +06:00
Sergey M․
034947dd1e Rename ISSUE_TEMPLATE.tmpl in order not to be picked up by github 2016-03-29 02:48:04 +06:00
Sergey M․
3c0de33ad7 Remove ISSUE_TEMPLATE.md 2016-03-29 02:43:48 +06:00
Sergey M․
89924f8230 [devscripts/make_issue_template] Fix NameError under python3 2016-03-29 02:41:27 +06:00
Sergey M․
a39c68f7e5 Exclude make_issue_template.py from flake8 2016-03-29 02:19:24 +06:00
Sergey M․
4a5a67ca25 [devscripts/release.sh] Make ISSUE_TEMPLATE.md and commit it 2016-03-29 02:18:52 +06:00
Sergey M․
8751da85a7 [Makefile] Fix ISSUE_TEMPLATE.md target 2016-03-29 02:17:57 +06:00
Sergey M․
3bf1df51fd [devscripts/make_issue_template] Rework to use ISSUE_TEMPLATE.tmpl (Closes #8785) 2016-03-29 02:16:38 +06:00
Sergey M․
3842a3e652 Add ISSUE_TEMPLATE.tmpl as template for ISSUE_TEMPLATE.md 2016-03-29 02:15:26 +06:00
Sander van den Oever
7710bdf4e8 Add initial ISSUE_TEMPLATE
Add auto-updating of youtube-dl version in ISSUE_TEMPLATE

Move parts of template text and adopt makefile to new format

Moved the 'kind-of-issue' section and rephrased a bit

Rephrased and moved Example URL section upwards

Moved ISSUE_TEMPLATE inside .github folder.

Update makefile to match new folderstructure
2016-03-28 22:43:13 +06:00
Sergey M
8d9dd3c34b [README.md] Add format_id to the list of string meta fields available for use in format selection 2016-03-28 03:08:34 +05:00
Sergey M․
33f3040a3e [YoutubeDL] Fix sanitizing subtitles' url 2016-03-28 03:13:39 +06:00
Sergey M․
03442072c0 [pornhub] Fix typo (Closes #9008) 2016-03-28 01:21:44 +06:00
Sergey M․
c8b13fec02 [foxnews] Restore upload time fields in test 2016-03-28 01:14:12 +06:00
Sergey M․
87d105ac6c [amp] Fix upload timestamp extraction (Closes #9007) 2016-03-28 01:13:47 +06:00
Sergey M․
3454139576 [pornhub:uservideos] Add support for multipage videos (Closes #9006) 2016-03-28 00:50:46 +06:00
Sergey M․
3a23bae9cc [pornhub:playlistbase] Do not include videos not from playlist 2016-03-28 00:32:57 +06:00
Sergey M․
8f9a477e7f [pornhub:playlistbase] Use orderedSet 2016-03-28 00:21:08 +06:00
Sergey M․
a1cf3e38a3 [bbc] Extend vpid regex (Closes #9003) 2016-03-27 23:22:51 +06:00
Philipp Hagemeister
a122e7080b release 2016.03.27 2016-03-27 16:56:33 +02:00
Sergey M․
b22ca76204 [extractor/common] Filter out unsupported encrypted media for f4m formats (Closes #8573) 2016-03-27 07:42:38 +06:00
Sergey M․
f7df343b4a [downloader/f4m] Extract routine for removing unsupported encrypted media 2016-03-27 07:41:19 +06:00
Sergey M․
19dbaeece3 Remove _sort_formats from _extract_*_formats methods
Now _sort_formats should be called explicitly.
_sort_formats has been added to all the necessary places in code.

Closes #8051
2016-03-27 07:03:08 +06:00
Yen Chi Hsuan
395fd4b08a [twitter] Handle another form of embedded Vine
Fixes #8996
2016-03-27 04:36:02 +08:00
Sergey M․
8018028d0f [pluralsight] Extract chapter metadata (Closes #8993) 2016-03-27 02:10:52 +06:00
Sergey M․
00322ad4fd [lynda] Extract chapter metadata (#8993) 2016-03-27 02:00:36 +06:00
Sergey M․
4cf3489c6e [vevo] Update videoservice API URL (Closes #8900) 2016-03-27 01:11:11 +06:00
Sergey M․
b24ab3e341 [udemy] Improve paid course detection 2016-03-27 00:09:12 +06:00
Sergey M․
af4116f4f0 [udemy] Improve format_id 2016-03-27 00:02:52 +06:00
Sergey M․
f973e5d54e [udemy] Drop outputs' formats
Always results in 403
2016-03-26 23:55:07 +06:00
Sergey M․
62f55aa68a [udemy] Add outputs metadata to view_html formats 2016-03-26 23:54:12 +06:00
Sergey M․
02d7634d24 [udemy] Fix outputs' formats format_id 2016-03-26 23:43:25 +06:00
Sergey M․
48dce58ca9 [udemy] Use custom sorting 2016-03-26 23:42:46 +06:00
Sergey M․
efcba804f6 [udemy] Extract formats from view_html (Closes #8979) 2016-03-26 23:42:34 +06:00
Sergey M․
6dee688e6d [youtube:playlistsbase] Restrict playlist regex (Closes #8986) 2016-03-26 20:42:18 +06:00
Sergey M․
eedb7ba536 [YoutubeDL] Sort imports 2016-03-26 19:40:33 +06:00
Sergey M․
dcf77cf1a7 [YoutubeDL] Sanitize final URLs (Closes #8991) 2016-03-26 19:37:41 +06:00
Sergey M․
17bcc626bf [utils] Extract sanitize_url routine 2016-03-26 19:33:57 +06:00
Sergey M․
b5a5bbf376 [mailru] Extend _VALID_URL (Closes #8990) 2016-03-26 19:15:32 +06:00
Yen Chi Hsuan
e68d3a010f [twitter] Fix extraction (closes #8966)
HLS and DASH formats are no longer appeared in test cases. I keep them
for fear of triggering new errors.
2016-03-26 18:34:51 +08:00
Yen Chi Hsuan
d10fe8358c [generic] Add a test case for brightcove embed
Closes #8862
2016-03-26 18:30:43 +08:00
Yen Chi Hsuan
d6c340cae5 [brightcove] Extract more formats (#8862) 2016-03-26 18:21:07 +08:00
Yen Chi Hsuan
5964b598ff [brightcove] Support alternative BrightcoveExperience layout
The full URL lays in the `data` attribute of <object> (#8862)
2016-03-26 17:47:32 +08:00
Philipp Hagemeister
62cdb96f51 release 2016.03.26 2016-03-26 08:58:03 +01:00
Sergey M․
e289d6d62c [test_compat] Add tests for compat_urllib_parse_urlencode 2016-03-26 02:38:33 +06:00
Sergey M․
6e6bc8dae5 Use urlencode_postdata across the codebase 2016-03-26 02:19:24 +06:00
Sergey M․
15707c7e02 [compat] Add compat_urllib_parse_urlencode and eliminate encode_dict
encode_dict functionality has been improved and moved directly into compat_urllib_parse_urlencode
All occurrences of compat_urllib_parse.urlencode throughout the codebase have been replaced by compat_urllib_parse_urlencode

Closes #8974
2016-03-26 01:46:57 +06:00
Sergey M․
2156f16ca7 [thescene] Fix extraction and improve style (Closes #8978) 2016-03-25 20:14:34 +06:00
Sergey M․
4db441de72 [once] Relax _VALID_URL (Closes #8976) 2016-03-25 19:51:28 +06:00
Philipp Hagemeister
0be8314dc8 release 2016.03.25 2016-03-25 09:27:18 +01:00
Yen Chi Hsuan
d7f62b049a [iqiyi] Update enc_key 2016-03-25 15:45:40 +08:00
Yen Chi Hsuan
3bb3356812 [douyutv] Extend _VALID_URL 2016-03-25 15:43:29 +08:00
Sergey M․
3f15fec1d1 Credit @Kagami for mnet (#8958) 2016-03-25 03:56:27 +06:00
Sergey M․
98e68806fb [mnet] Improve (Closes #8958) 2016-03-25 03:26:29 +06:00
Kagami Hiiragi
e031768666 [mnet] Add new extractor 2016-03-25 02:32:06 +06:00
Sergey M․
5eb7db4ee9 [udemy] Add support for new URL schema 2016-03-25 02:28:39 +06:00
Sergey M․
f0e83681d9 [udemy] Extract formats from outputs 2016-03-25 02:27:13 +06:00
Sergey M․
ff9d5d0938 [udemy] Improve course enrolling 2016-03-25 02:26:46 +06:00
Sergey M․
d041a73674 [extractor/__init__] Add youtube:live and sort youtube extractors alphabetically 2016-03-25 01:39:25 +06:00
Sergey M․
f07e276a04 [youtube:live] Add extractor (Closes #8959) 2016-03-25 01:18:14 +06:00
Sergey M․
993271da0a [nytimes] Tolerate missing metadata (Closes #8952) 2016-03-24 23:28:24 +06:00
Sergey M․
369e7e3ff0 [iprima] Fix extraction (Closes #8953) 2016-03-24 22:54:26 +06:00
Sergey M․
5767b4eeae [mtv] Fix description extraction (Closes #8962) 2016-03-24 22:23:31 +06:00
Yen Chi Hsuan
622d19160b [utils] Clarify Python versions affected by buggy struct module 2016-03-24 18:06:15 +08:00
Yen Chi Hsuan
32d88410eb [tumblr] Add a test with Instagram embed
Closes #8817
2016-03-24 16:32:53 +08:00
Yen Chi Hsuan
5a51775a58 [generic] Extract Instagram embeds (#8817) 2016-03-24 16:32:27 +08:00
Yen Chi Hsuan
87696e78d7 [instagram] Unescape description (#8817) 2016-03-24 16:30:01 +08:00
Yen Chi Hsuan
c4096e8aea [instagram] Extract embed videos (#8817) 2016-03-24 16:29:33 +08:00
Yen Chi Hsuan
fc27ea9464 [tumblr] Support Vine embeds (#8817) 2016-03-23 23:55:52 +08:00
Yen Chi Hsuan
088e1aac59 [generic] Support Vine embeds (#8817) 2016-03-23 23:55:08 +08:00
Yen Chi Hsuan
81f36eba88 [test/test_utils] Update for escape_url change (again) 2016-03-23 23:23:26 +08:00
Yen Chi Hsuan
2d60465e44 [test/test_utils] Update for escape_url change 2016-03-23 23:20:28 +08:00
Sergey M
4333d56494 Merge pull request #8898 from dstftw/fragment-retries
Add --fragment-retries option (Fixes #8466)
2016-03-23 20:12:32 +05:00
Sergey M․
882c699296 [tunein] Fix stream data extraction (Closes #8899, closes #8924) 2016-03-23 20:45:39 +06:00
Yen Chi Hsuan
efbed08dc2 [utils] Encode hostnames before passing to urllib
With IDN (Internationalized Domain Name) and a proxy, non-ascii URLs
are passed down to urllib/urllib2, causing UnicodeEncodeError

Fixes #8890
2016-03-23 22:24:52 +08:00
Jaime Marquínez Ferrándiz
7da2c87119 Add extractor for thescene.com (closes #8929) 2016-03-22 22:17:59 +01:00
Sergey M․
c6ca11f1b3 [once] Prevent ads from embedding into m3u8 playlists (Closes #8893) 2016-03-22 23:48:05 +06:00
Sergey M․
2beeb286e1 [laola1tv] Add support for livestreams (Closes #8934) 2016-03-22 22:32:59 +06:00
Sergey M․
cc7397b04d [ceskatelevize] Make m3u8 formats extraction non fatal (Closes #8933) 2016-03-22 21:12:29 +06:00
Sergey M․
bc5d16b302 [animeondemand] Skip dash for now 2016-03-21 23:37:39 +06:00
Sergey M․
85c637b737 [animeondemand] Extract teaser when no full episode available (#8923) 2016-03-21 23:35:50 +06:00
Sergey M․
5c69f7a479 [animeondemand] Respect startvideo (Closes #8923) 2016-03-21 23:31:40 +06:00
Sergey M․
ff5873b72d [motherless] Detect friends only videos 2016-03-21 22:24:42 +06:00
Sergey M․
065c4b27bf [xhamster:embed] Extract vars (Closes #8912) 2016-03-21 22:07:34 +06:00
Sergey M․
1600ed1ff9 [rutv] Improve flash version pattern (Closes #8911) 2016-03-21 21:46:49 +06:00
Sergey M․
5886b38d73 Add support for https for all extractors as preventive and future-proof measure 2016-03-21 21:36:32 +06:00
Sergey M․
0cef27ad25 Add missing r prefix for _VALID_URLs 2016-03-21 21:22:37 +06:00
Sergey M․
12af4beb3e [mailru] Add support for https (Closes #8920) 2016-03-21 21:17:29 +06:00
Sergey M․
9016d76f71 [YoutubeDL] Improve _format_note 2016-03-20 22:01:45 +06:00
Sergey M․
3c5d183c19 [animeondemand] Extract all formats (Closes #8906) 2016-03-20 21:51:22 +06:00
Sergey M․
3e8bb9a972 [animeondemand] Detect geo restriction 2016-03-20 20:39:00 +06:00
Yen Chi Hsuan
daef04a4e7 [kwuo] Fix KuwoChartIE and KuwoSingerIE and accept new URL forms 2016-03-20 20:17:56 +08:00
Yen Chi Hsuan
7caae128a7 Credit @vitstradal for the key algorithm in OpenloadIE (#8489)
[ci skip]
2016-03-20 19:12:02 +08:00
Yen Chi Hsuan
2648918c81 [vlive] Fix creator extraction (closes #8814) 2016-03-20 18:15:53 +08:00
Jaime Marquínez Ferrándiz
920d318d3c README: document that BSD make is also supported (#8902) 2016-03-20 10:55:14 +01:00
Yen Chi Hsuan
9e3c2f1d74 [openload] Misc improvements
* Add thumbnail
* Detect errors (#6469)
* Match more (#6469, #8489)
2016-03-20 16:49:44 +08:00
Yen Chi Hsuan
2bfeee69b9 [openload] Add new extractor (closes #8489) 2016-03-20 15:54:58 +08:00
Yen Chi Hsuan
664bcd80b9 [tudou] Use InAdvancePagedList (closes #8884) 2016-03-20 15:45:31 +08:00
Sergey M․
3c20208eff [francetv] Improve formats extraction 2016-03-20 13:00:46 +06:00
Sergey M․
db264e3cc3 [francetvinfo] Add support for france3-regions and strip title (Closes #7673) 2016-03-20 12:44:04 +06:00
Sergey M
d396f30467 Merge pull request #8902 from jaimeMF/bmake
Makefile: make it compatible with bmake
2016-03-20 11:08:57 +05:00
Sergey M․
96a9f22d98 [discovery] Relax _VALID_URL (Closes #8903) 2016-03-20 10:26:58 +06:00
Sergey M․
40025ee2a3 [postprocessort/ffmpeg] Allow embedding webvtt into webm (Closes #8874) 2016-03-20 04:12:34 +06:00
Jaime Marquínez Ferrándiz
3ff63fb365 Makefile: make it compatible with bmake
It's the portable version of BSD make: http://crufty.net/help/sjg/bmake.html
The syntax for conditionals is different in GNU make and BSD make, so we use the shell
2016-03-19 21:51:13 +01:00
Jaime Marquínez Ferrándiz
5c7cd37ebd tox.ini: Exclude test_iqiyi_sdk_interpreter.py 2016-03-19 21:50:16 +01:00
Sergey M․
298c04b464 [91porn] Use common messages' wording 2016-03-20 02:35:48 +06:00
Sergey M․
d95114dd83 [91porn] Unquote final URL (Closes #8881) 2016-03-20 02:34:02 +06:00
Sergey M․
94dcade8f8 Credit @jjatria for biobiochiletv (#7314) 2016-03-20 01:36:20 +06:00
Sergey M․
fa023ccb2c [biobiochiletv] Fix extraction, extract m3u8 formats and overall improve (Closes #7314) 2016-03-20 01:31:55 +06:00
jjatria
e36f4aa72b [biobiotv] Add extractor 2016-03-20 01:29:08 +06:00
Sergey M․
9261e347cc Credit @kasper93 for cda (#8805) 2016-03-19 23:18:04 +06:00
Sergey M․
f1ced6df51 [cda] Improve and simplify (Closes #8805) 2016-03-19 23:17:14 +06:00
Kacper Michajłow
8b0d7a66ef [cda] Add new extractor for cda.pl
Fixes #8760
2016-03-19 22:42:40 +06:00
Sergey M․
3aec71766d [safari:api] Separate extractor (Closes #8871) 2016-03-19 22:30:48 +06:00
Sergey M․
16a8b7986b [downloader/fragment] Document fragment_retries 2016-03-19 20:54:21 +06:00
Sergey M․
617e58d850 [downloader/{common,fragment}] Fix total retries reporting on python 2.6 2016-03-19 20:51:30 +06:00
Sergey M․
e33baba0dd [downloader/dash] Add fragment retry capability
YouTube may often return 404 HTTP error for a fragment causing the
whole download to fail. However if the same fragment is immediately
retried with the same request data this usually succeeds (1-2 attemps
is usually enough) thus allowing to download the whole file successfully.
So, we will retry all fragments that fail with 404 HTTP error for now.
2016-03-19 20:42:23 +06:00
Sergey M․
721f26b821 [downloader/fragment] Add report_retry_fragment 2016-03-19 20:41:24 +06:00
Sergey M․
52bb437e41 [options] Add --fragment-retries option 2016-03-19 20:40:36 +06:00
Jaime Marquínez Ferrándiz
782b1b5bd1 [utils] lookup_unit_table: Match word boundary instead of end of string 2016-03-19 11:44:49 +01:00
Sergey M․
0d769bcb78 [extractor/generic] Fix missing byte literal prefix 2016-03-19 05:43:43 +06:00
remitamine
4cd70099ea [hbo] Add new extractor 2016-03-18 21:18:18 +01:00
Jaime Marquínez Ferrándiz
09fc33198a utils: lookup_unit_table: Use a stricter regex
In parse_count multiple units start with the same letter, so it would match different units depending on the order they were sorted when iterating over them.
2016-03-18 19:23:06 +01:00
Sergey M․
4c3b16d5d1 [test_YoutubeDL] Add test for format_id format selection 2016-03-19 00:04:26 +06:00
John Peel
d5aacf9a90 Added format_id to the filers on -f. 2016-03-18 23:59:24 +06:00
Sergey M․
19e2617a6f [commonprotocols] Add generic support for rtmp URLs (Closes #8488) 2016-03-18 23:42:15 +06:00
Sergey M․
edd9b71c2c [extractor/generic] Add a test for m3u playlist served without proper Content-Type 2016-03-18 22:49:11 +06:00
Sergey M․
5940862d5a [extractor/generic] Detect m3u playlists served without proper Content-Type 2016-03-18 22:45:28 +06:00
Sergey M․
de6c51e88e [extractor/generic] Fix direct link semantics 2016-03-18 22:43:07 +06:00
Sergey M․
303dcdb995 [extractor/generic] Simplify upload_date extraction 2016-03-18 22:41:16 +06:00
Sergey M․
20938f768b [extractor/generic] Add another test for generic m3u8 2016-03-18 21:54:33 +06:00
Sergey M․
955737b2d4 [extractor/generic] Force Content-Type to lowecase 2016-03-18 21:50:44 +06:00
Sergey M․
263eff9537 [extractor/generic] Properly extract format id from Content-Type
Fixes extraction for cases like: audio/x-mpegURL; charset=utf-8
2016-03-18 21:50:10 +06:00
Sergey M․
cae21032ab [theplatform] Improve geo restriction detection 2016-03-18 21:08:25 +06:00
remitamine
6187091532 [once] check http formats availability 2016-03-18 11:51:34 +01:00
Philipp Hagemeister
0d33166ec5 release 2016.03.18 2016-03-18 11:43:48 +01:00
remitamine
87c03c6bd2 [theplatform] remove unnecessary import 2016-03-18 09:43:28 +01:00
remitamine
4c92fd2e83 [theplatform] always force theplatform to return a smil for _extract_theplatform_smil 2016-03-18 09:22:10 +01:00
Sergey M․
e3d17b3c07 [noz] Fix extraction on python 2.6 by means of using compat_xpath 2016-03-18 02:54:27 +06:00
Sergey M․
810c10baa1 [utils] Use compat_xpath 2016-03-18 02:52:23 +06:00
Sergey M․
57f7e3c62d [compat] Add compat_xpath 2016-03-18 02:51:38 +06:00
Sergey M․
0d0e282912 [animeondemand] Fix typo and improve 2016-03-18 00:13:50 +06:00
Sergey M․
85e8f26b82 [animeondemand] Improve extraction 2016-03-18 00:02:34 +06:00
Sergey M․
b57fecfddd [animeondemand] Add test 2016-03-17 23:50:10 +06:00
Sergey M․
8c97e7efb6 [animeondemand] Expand episode title regex (Closes #8875) 2016-03-17 23:43:14 +06:00
Sergey M․
cc162f6a0a [crunchyroll] Fix custom _download_webpage (Closes #8883) 2016-03-17 22:55:04 +06:00
remitamine
cf45ed786e [wistia] extract more metadata 2016-03-17 17:48:17 +01:00
remitamine
574b2a7393 [nbc:nbcnews] improve extraction(fixes #6922)
- extract more metadata and formats
- relax regex
2016-03-17 16:11:29 +01:00
remitamine
9f02ff537c [theplatform] extract brightcove once formats 2016-03-17 16:11:29 +01:00
remitamine
0436ec0e7a [once] Add new format extractor 2016-03-17 16:11:29 +01:00
Yen Chi Hsuan
11f12195af [youtube] Added itag 91
Seen in https://www.youtube.com/watch?v=jMN4cxyhJjk
2016-03-17 19:25:37 +08:00
remitamine
a646a8cf98 [sbs] improve extraction(fixes #3811)
- extract error messages
- force the platform smil url(previously the manifest param
in the query is not respected which make theplatform return non working
mp4 files for some videos)
2016-03-17 02:07:06 +01:00
remitamine
63f41d3821 [bravotv] Add new extractor(#4657) 2016-03-16 21:26:25 +01:00
Sergey M․
c5229f3926 [utils] PEP 8 2016-03-16 21:50:04 +06:00
Sergey M․
96f4f796fb [brightcover] Remove unused import 2016-03-16 21:47:51 +06:00
Sergey M․
70cab344c4 [udemy] Improve course id v4 regex 2016-03-16 21:46:09 +06:00
Quan Hua
a7ba57dc17 [udemy] Update course id regex to cover v4 layout (Closes #8753, closes #8868, closes #8870) 2016-03-16 21:45:01 +06:00
remitamine
83548824c2 Merge pull request #8092 from bpfoley/twitter-thumbnail
[utils] Add extract_attributes for extracting html tag attributes
2016-03-16 13:16:27 +01:00
remitamine
354dbbd880 [brightcove:new] extract protocol-less embed URLs(closes #2914) 2016-03-16 11:46:53 +01:00
remitamine
23edc49509 [tv3] Add new extractor(closes #8059) 2016-03-16 10:47:39 +01:00
remitamine
48254c3f2c [brightcove] some improvements and fixes
- use FFmpeg downloader to download m3u8 formats extracted
from BrightcoveNew(some of the m3u8 media playlists use AES-128)
- update comment and update_url_query to handle url query
2016-03-16 09:21:07 +01:00
remitamine
2cab48704c [thestar] Add new extractor(closes #5955) 2016-03-15 23:10:31 +01:00
remitamine
64d4f31d78 [brightcove:new] update embed_in_page embeds regex to match non numeric ref id 2016-03-15 22:50:43 +01:00
remitamine
0c9ff24041 [noz] fix extraction in python 2.6 2016-03-15 21:00:39 +01:00
Yen Chi Hsuan
3ff8279e80 [kuwo:mv] Fix the test and extraction of georestricted MVs 2016-03-16 02:41:18 +08:00
remitamine
cb6e477dfe [aljazeera] update the extractor to use BrightcoveNewIE 2016-03-15 19:38:10 +01:00
remitamine
edfd93518e [svt] extract dashhbbtv formats(#8867) 2016-03-15 19:33:09 +01:00
remitamine
89807d6a82 [brightcove] extract dash formats and detect audio formats 2016-03-15 18:48:21 +01:00
remitamine
49dea4913b Merge pull request #8513 from remitamine/dash-sort
[extractor/common] fix dash formats sorting
2016-03-15 18:39:50 +01:00
Sergey M․
dec2cae0a7 [twitch:playlistbase] Clarify pagination bug
Pagination bug has been fixed by twitch on 15.03.2016.
2016-03-15 21:45:43 +06:00
remitamine
cf6cd07396 [noz] extract f4m and m3u8 formats 2016-03-15 15:24:12 +01:00
remitamine
975b9c9ab0 [brightcove:new] detect m3u8 manifests by M2TS container 2016-03-15 10:06:53 +01:00
remitamine
8ac73bdbe4 [brightcove:new] Add support for non numeric ref: preffixed video ids 2016-03-15 10:03:08 +01:00
remitamine
877f440f7b [rice] Add new extractor(closes #1736) 2016-03-15 00:49:23 +01:00
remitamine
d13bdc3824 [brightcove] raise ExtractorError on 403 errors and fix regex to work with tenplay 2016-03-14 22:24:52 +01:00
remitamine
744daf9418 [gameinformer] remove unused imports 2016-03-14 21:57:26 +01:00
remitamine
bf475e1990 [tlc] fix extraction and update extractor to use BrightcoveNewIE 2016-03-14 21:53:00 +01:00
remitamine
203f3d779a [gameinformer] update the extractor to use BrightcoveNewIE 2016-03-14 18:32:29 +01:00
remitamine
4230c4894d [external/downloader] fix rtmp downloading using FFmpegFD 2016-03-14 16:51:01 +01:00
Brian Foley
8bb56eeeea [utils] Add extract_attributes for extracting html tag attributes
This is much more robust than just using regexps, and handles all
the common scenarios, such as empty/no values, repeated attributes,
entity decoding, mixed case names, and the different possible value
quoting schemes.
2016-03-03 10:11:37 +00:00
remitamine
dd86780596 [extractor/common] fix dash formats sorting 2016-02-11 10:55:50 +01:00
287 changed files with 3696 additions and 1280 deletions

58
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,58 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.06*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.06**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)
- [ ] Site support request (request for adding support for a new site)
- [ ] Feature request (request for a new functionality)
- [ ] Question
- [ ] Other
---
### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
---
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.04.06
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
<end of log>
```
---
### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
---
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

58
.github/ISSUE_TEMPLATE_tmpl.md vendored Normal file
View File

@@ -0,0 +1,58 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **%(version)s**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)
- [ ] Site support request (request for adding support for a new site)
- [ ] Feature request (request for a new functionality)
- [ ] Question
- [ ] Other
---
### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
---
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version %(version)s
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
<end of log>
```
---
### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
---
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

View File

@@ -163,3 +163,7 @@ Patrick Griffis
Aidan Rowe
mutantmonkey
Ben Congdon
Kacper Michajłow
José Joaquín Atria
Viťas Strádal
Kagami Hiiragi

View File

@@ -85,7 +85,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need
* python
* make
* make (both GNU make and BSD make are supported)
* pandoc
* zip
* nosetests

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
@@ -12,15 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
ifeq ($(PREFIX),/usr)
SYSCONFDIR=/etc
else
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
endif
SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR)
@@ -67,6 +59,9 @@ README.md: youtube_dl/*.py youtube_dl/*/*.py
CONTRIBUTING.md: README.md
$(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
.github/ISSUE_TEMPLATE.md: devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md youtube_dl/version.py
$(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md .github/ISSUE_TEMPLATE.md
supportedsites:
$(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md

View File

@@ -164,6 +164,8 @@ which means you can modify it, redistribute it or use it however you like.
(e.g. 50K or 4.2M)
-R, --retries RETRIES Number of retries (default is 10), or
"infinite".
--fragment-retries RETRIES Number of retries for a fragment (default
is 10), or "infinite" (DASH only)
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
(default is 1024)
--no-resize-buffer Do not automatically adjust the buffer
@@ -376,8 +378,8 @@ which means you can modify it, redistribute it or use it however you like.
--no-post-overwrites Do not overwrite post-processed files; the
post-processed files are overwritten by
default
--embed-subs Embed subtitles in the video (only for mkv
and mp4 videos)
--embed-subs Embed subtitles in the video (only for mp4,
webm and mkv videos)
--embed-thumbnail Embed thumbnail in the audio as cover art
--add-metadata Write metadata to the video file
--metadata-from-title FORMAT Parse additional metadata like song title /
@@ -598,6 +600,7 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `vcodec`: Name of the video codec in use
- `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
- `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
@@ -831,7 +834,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need
* python
* make
* make (both GNU make and BSD make are supported)
* pandoc
* zip
* nosetests

View File

@@ -0,0 +1,29 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
issue_template_tmpl = inf.read()
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
out = issue_template_tmpl % {'version': locals()['__version__']}
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View File

@@ -45,9 +45,9 @@ fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing documentation and youtube_dl/version.py..."
make README.md CONTRIBUTING.md supportedsites
git add README.md CONTRIBUTING.md docs/supportedsites.md youtube_dl/version.py
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@@ -57,6 +57,7 @@
- **AudioBoom**
- **audiomack**
- **audiomack:album**
- **auroravid**: AuroraVid
- **Azubu**
- **AzubuLive**
- **BaiduVideo**: 百度视频
@@ -74,6 +75,7 @@
- **Bigflix**
- **Bild**: Bild.de
- **BiliBili**
- **BioBioChileTV**
- **BleacherReport**
- **BleacherReportCMS**
- **blinkx**
@@ -81,6 +83,7 @@
- **BokeCC**
- **Bpb**: Bundeszentrale für politische Bildung
- **BR**: Bayerischer Rundfunk Mediathek
- **BravoTV**
- **Break**
- **brightcove:legacy**
- **brightcove:new**
@@ -90,15 +93,18 @@
- **BYUtv**
- **Camdemy**
- **CamdemyFolder**
- **CamWithHer**
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**
- **CBC**
- **CBCPlayer**
- **CBS**
- **CBSInteractive**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
- **CDA**
- **CeskaTelevize**
- **channel9**: Channel 9
- **Chaturbate**
@@ -115,7 +121,7 @@
- **Clubic**
- **Clyp**
- **cmt.com**
- **CNET**
- **CNBC**
- **CNN**
- **CNNArticle**
- **CNNBlogs**
@@ -131,6 +137,7 @@
- **CrooksAndLiars**
- **Crunchyroll**
- **crunchyroll:playlist**
- **CSNNE**
- **CSpan**: C-SPAN
- **CtsNews**: 華視新聞
- **culturebox.francetvinfo.fr**
@@ -243,6 +250,7 @@
- **GPUTechConf**
- **Groupon**
- **Hark**
- **HBO**
- **HearThisAt**
- **Heise**
- **HellPorno**
@@ -343,6 +351,7 @@
- **MiTele**: mitele.es
- **mixcloud**
- **MLB**
- **Mnet**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
- **Mojvideo**
@@ -371,7 +380,8 @@
- **myvideo** (Currently broken)
- **MyVidster**
- **n-tv.de**
- **NationalGeographic**
- **natgeo**
- **natgeo:channel**
- **Naver**
- **NBA**
- **NBC**
@@ -411,7 +421,6 @@
- **Normalboots**
- **NosVideo**
- **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
- **novamov**: NovaMov
- **nowness**
- **nowness:playlist**
- **nowness:series**
@@ -439,6 +448,7 @@
- **OnionStudios**
- **Ooyala**
- **OoyalaExternal**
- **Openload**
- **OraTV**
- **orf:fm4**: radio FM4
- **orf:iptv**: iptv.ORF.at
@@ -499,6 +509,7 @@
- **Restudy**
- **ReverbNation**
- **Revision3**
- **RICE**
- **RingTV**
- **RottenTomatoes**
- **Roxwel**
@@ -523,6 +534,7 @@
- **RUTV**: RUTV.RU
- **Ruutu**
- **safari**: safaribooksonline.com online video
- **safari:api**
- **safari:course**: safaribooksonline.com online courses
- **Sandia**: Sandia National Laboratories
- **Sapo**: SAPO Vídeos
@@ -610,13 +622,14 @@
- **Telegraaf**
- **TeleMB**
- **TeleTask**
- **TenPlay**
- **TF1**
- **TheIntercept**
- **TheOnion**
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
- **TheSixtyOne**
- **TheStar**
- **ThisAmericanLife**
- **ThisAV**
- **THVideo**
@@ -650,6 +663,7 @@
- **tv.dfb.de**
- **TV2**
- **TV2Article**
- **TV3**
- **TV4**: tv4.se and tv4play.se
- **TVC**
- **TVCArticle**
@@ -729,6 +743,7 @@
- **vlive**
- **Vodlocker**
- **VoiceRepublic**
- **VoxMedia**
- **Vporn**
- **vpro**: npo.nl and ntr.nl
- **VRT**
@@ -782,6 +797,7 @@
- **youtube:channel**: YouTube.com channels
- **youtube:favorites**: YouTube.com favourite videos, ":ytfav" for short (requires authentication)
- **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication)
- **youtube:live**: YouTube.com live streams
- **youtube:playlist**: YouTube.com playlists
- **youtube:playlists**: YouTube.com user/channel playlists
- **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication)

View File

@@ -2,5 +2,5 @@
universal = True
[flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,setup.py,build,.git
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/make_issue_template.py,setup.py,build,.git
ignore = E402,E501,E731

View File

@@ -222,6 +222,11 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
ydl = YDL({'format': 'bestvideo[format_id^=dash][format_id$=low]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
formats = [
{'format_id': 'vid-vcodec-dot', 'ext': 'mp4', 'preference': 1, 'vcodec': 'avc1.123456', 'acodec': 'none', 'url': TEST_URL},
]

View File

@@ -19,6 +19,7 @@ from youtube_dl.compat import (
compat_str,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlencode,
)
@@ -70,6 +71,16 @@ class TestCompat(unittest.TestCase):
self.assertEqual(compat_urllib_parse_unquote_plus('abc%20def'), 'abc def')
self.assertEqual(compat_urllib_parse_unquote_plus('%7e/abc+def'), '~/abc def')
def test_compat_urllib_parse_urlencode(self):
self.assertEqual(compat_urllib_parse_urlencode({'abc': 'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': 'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([('abc', 'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([('abc', b'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([(b'abc', 'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([(b'abc', b'def')]), 'abc=def')
def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])

View File

@@ -1,4 +1,5 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
@@ -120,5 +121,14 @@ class TestProxy(unittest.TestCase):
response = ydl.urlopen(req).read().decode('utf-8')
self.assertEqual(response, 'cn: {0}'.format(url))
def test_proxy_with_idn(self):
ydl = YoutubeDL({
'proxy': 'localhost:{0}'.format(self.port),
})
url = 'http://中文.tw/'
response = ydl.urlopen(url).read().decode('utf-8')
# b'xn--fiq228c' is '中文'.encode('idna')
self.assertEqual(response, 'normal: http://xn--fiq228c.tw/')
if __name__ == '__main__':
unittest.main()

View File

@@ -28,6 +28,7 @@ from youtube_dl.utils import (
encodeFilename,
escape_rfc3986,
escape_url,
extract_attributes,
ExtractorError,
find_xpath_attr,
fix_xml_ampersands,
@@ -77,6 +78,7 @@ from youtube_dl.utils import (
cli_bool_option,
)
from youtube_dl.compat import (
compat_chr,
compat_etree_fromstring,
compat_urlparse,
compat_parse_qs,
@@ -575,11 +577,11 @@ class TestUtil(unittest.TestCase):
)
self.assertEqual(
escape_url('http://тест.рф/фрагмент'),
'http://тест.рф/%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82'
'http://xn--e1aybc.xn--p1ai/%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82'
)
self.assertEqual(
escape_url('http://тест.рф/абв?абв=абв#абв'),
'http://тест.рф/%D0%B0%D0%B1%D0%B2?%D0%B0%D0%B1%D0%B2=%D0%B0%D0%B1%D0%B2#%D0%B0%D0%B1%D0%B2'
'http://xn--e1aybc.xn--p1ai/%D0%B0%D0%B1%D0%B2?%D0%B0%D0%B1%D0%B2=%D0%B0%D0%B1%D0%B2#%D0%B0%D0%B1%D0%B2'
)
self.assertEqual(escape_url('http://vimeo.com/56015672#at=0'), 'http://vimeo.com/56015672#at=0')
@@ -629,6 +631,44 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{"abc": "def",}')
self.assertEqual(json.loads(on), {'abc': 'def'})
def test_extract_attributes(self):
self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
self.assertEqual(extract_attributes('<e x=y>'), {'x': 'y'})
self.assertEqual(extract_attributes('<e x="a \'b\' c">'), {'x': "a 'b' c"})
self.assertEqual(extract_attributes('<e x=\'a "b" c\'>'), {'x': 'a "b" c'})
self.assertEqual(extract_attributes('<e x="&#121;">'), {'x': 'y'})
self.assertEqual(extract_attributes('<e x="&#x79;">'), {'x': 'y'})
self.assertEqual(extract_attributes('<e x="&amp;">'), {'x': '&'}) # XML
self.assertEqual(extract_attributes('<e x="&quot;">'), {'x': '"'})
self.assertEqual(extract_attributes('<e x="&pound;">'), {'x': '£'}) # HTML 3.2
self.assertEqual(extract_attributes('<e x="&lambda;">'), {'x': 'λ'}) # HTML 4.0
self.assertEqual(extract_attributes('<e x="&foo">'), {'x': '&foo'})
self.assertEqual(extract_attributes('<e x="\'">'), {'x': "'"})
self.assertEqual(extract_attributes('<e x=\'"\'>'), {'x': '"'})
self.assertEqual(extract_attributes('<e x >'), {'x': None})
self.assertEqual(extract_attributes('<e x=y a>'), {'x': 'y', 'a': None})
self.assertEqual(extract_attributes('<e x= y>'), {'x': 'y'})
self.assertEqual(extract_attributes('<e x=1 y=2 x=3>'), {'y': '2', 'x': '3'})
self.assertEqual(extract_attributes('<e \nx=\ny\n>'), {'x': 'y'})
self.assertEqual(extract_attributes('<e \nx=\n"y"\n>'), {'x': 'y'})
self.assertEqual(extract_attributes("<e \nx=\n'y'\n>"), {'x': 'y'})
self.assertEqual(extract_attributes('<e \nx="\ny\n">'), {'x': '\ny\n'})
self.assertEqual(extract_attributes('<e CAPS=x>'), {'caps': 'x'}) # Names lowercased
self.assertEqual(extract_attributes('<e x=1 X=2>'), {'x': '2'})
self.assertEqual(extract_attributes('<e X=1 x=2>'), {'x': '2'})
self.assertEqual(extract_attributes('<e _:funny-name1=1>'), {'_:funny-name1': '1'})
self.assertEqual(extract_attributes('<e x="Fáilte 世界 \U0001f600">'), {'x': 'Fáilte 世界 \U0001f600'})
self.assertEqual(extract_attributes('<e x="décompose&#769;">'), {'x': 'décompose\u0301'})
# "Narrow" Python builds don't support unicode code points outside BMP.
try:
compat_chr(0x10000)
supports_outside_bmp = True
except ValueError:
supports_outside_bmp = False
if supports_outside_bmp:
self.assertEqual(extract_attributes('<e x="Smile &#128512;!">'), {'x': 'Smile \U0001f600!'})
def test_clean_html(self):
self.assertEqual(clean_html('a:\nb'), 'a: b')
self.assertEqual(clean_html('a:\n "b"'), 'a: "b"')
@@ -662,6 +702,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_count('1.000'), 1000)
self.assertEqual(parse_count('1.1k'), 1100)
self.assertEqual(parse_count('1.1kk'), 1100000)
self.assertEqual(parse_count('1.1kk '), 1100000)
self.assertEqual(parse_count('1.1kk views'), 1100000)
def test_version_tuple(self):
self.assertEqual(version_tuple('1'), (1,))

View File

@@ -8,6 +8,6 @@ deps =
passenv = HOME
defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
--exclude test_subtitles.py --exclude test_write_annotations.py
--exclude test_youtube_lists.py
--exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@@ -39,6 +39,8 @@ from .compat import (
compat_urllib_request_DataHandler,
)
from .utils import (
age_restricted,
args_to_str,
ContentTooShortError,
date_from_str,
DateRange,
@@ -58,13 +60,16 @@ from .utils import (
PagedList,
parse_filesize,
PerRequestProxyHandler,
PostProcessingError,
platform_name,
PostProcessingError,
preferredencoding,
prepend_extension,
render_table,
replace_extension,
SameFileError,
sanitize_filename,
sanitize_path,
sanitize_url,
sanitized_Request,
std_headers,
subtitles_filename,
@@ -75,10 +80,6 @@ from .utils import (
write_string,
YoutubeDLCookieProcessor,
YoutubeDLHandler,
prepend_extension,
replace_extension,
args_to_str,
age_restricted,
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractors
@@ -905,7 +906,7 @@ class YoutubeDL(object):
'*=': lambda attr, value: value in attr,
}
str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol)
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+)
\s*$
@@ -1229,6 +1230,7 @@ class YoutubeDL(object):
t.get('preference'), t.get('width'), t.get('height'),
t.get('id'), t.get('url')))
for i, t in enumerate(thumbnails):
t['url'] = sanitize_url(t['url'])
if t.get('width') and t.get('height'):
t['resolution'] = '%dx%d' % (t['width'], t['height'])
if t.get('id') is None:
@@ -1263,6 +1265,8 @@ class YoutubeDL(object):
if subtitles:
for _, subtitle in subtitles.items():
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if 'ext' not in subtitle_format:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
@@ -1292,6 +1296,8 @@ class YoutubeDL(object):
if 'url' not in format:
raise ExtractorError('Missing "url" key in result (index %d)' % i)
format['url'] = sanitize_url(format['url'])
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
else:
@@ -1836,7 +1842,7 @@ class YoutubeDL(object):
if fdict.get('language'):
if res:
res += ' '
res += '[%s]' % fdict['language']
res += '[%s] ' % fdict['language']
if fdict.get('format_note') is not None:
res += fdict['format_note'] + ' '
if fdict.get('tbr') is not None:

View File

@@ -144,14 +144,20 @@ def _real_main(argv=None):
if numeric_limit is None:
parser.error('invalid max_filesize specified')
opts.max_filesize = numeric_limit
if opts.retries is not None:
if opts.retries in ('inf', 'infinite'):
opts_retries = float('inf')
def parse_retries(retries):
if retries in ('inf', 'infinite'):
parsed_retries = float('inf')
else:
try:
opts_retries = int(opts.retries)
parsed_retries = int(retries)
except (TypeError, ValueError):
parser.error('invalid retry count specified')
return parsed_retries
if opts.retries is not None:
opts.retries = parse_retries(opts.retries)
if opts.fragment_retries is not None:
opts.fragment_retries = parse_retries(opts.fragment_retries)
if opts.buffersize is not None:
numeric_buffersize = FileDownloader.parse_bytes(opts.buffersize)
if numeric_buffersize is None:
@@ -299,7 +305,8 @@ def _real_main(argv=None):
'force_generic_extractor': opts.force_generic_extractor,
'ratelimit': opts.ratelimit,
'nooverwrites': opts.nooverwrites,
'retries': opts_retries,
'retries': opts.retries,
'fragment_retries': opts.fragment_retries,
'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer,
'continuedl': opts.continue_dl,

View File

@@ -77,6 +77,11 @@ try:
except ImportError: # Python 2
from urllib import urlretrieve as compat_urlretrieve
try:
from html.parser import HTMLParser as compat_HTMLParser
except ImportError: # Python 2
from HTMLParser import HTMLParser as compat_HTMLParser
try:
from subprocess import DEVNULL
@@ -164,6 +169,32 @@ except ImportError: # Python 2
string = string.replace('+', ' ')
return compat_urllib_parse_unquote(string, encoding, errors)
try:
from urllib.parse import urlencode as compat_urllib_parse_urlencode
except ImportError: # Python 2
# Python 2 will choke in urlencode on mixture of byte and unicode strings.
# Possible solutions are to either port it from python 3 with all
# the friends or manually ensure input query contains only byte strings.
# We will stick with latter thus recursively encoding the whole query.
def compat_urllib_parse_urlencode(query, doseq=0, encoding='utf-8'):
def encode_elem(e):
if isinstance(e, dict):
e = encode_dict(e)
elif isinstance(e, (list, tuple,)):
list_e = encode_list(e)
e = tuple(list_e) if isinstance(e, tuple) else list_e
elif isinstance(e, compat_str):
e = e.encode(encoding)
return e
def encode_dict(d):
return dict((encode_elem(k), encode_elem(v)) for k, v in d.items())
def encode_list(l):
return [encode_elem(e) for e in l]
return compat_urllib_parse.urlencode(encode_elem(query), doseq=doseq)
try:
from urllib.request import DataHandler as compat_urllib_request_DataHandler
except ImportError: # Python < 3.4
@@ -251,6 +282,16 @@ else:
el.text = el.text.decode('utf-8')
return doc
if sys.version_info < (2, 7):
# Here comes the crazy part: In 2.6, if the xpath is a unicode,
# .//node does not match if a node is a direct child of . !
def compat_xpath(xpath):
if isinstance(xpath, compat_str):
xpath = xpath.encode('ascii')
return xpath
else:
compat_xpath = lambda xpath: xpath
try:
from urllib.parse import parse_qs as compat_parse_qs
except ImportError: # Python 2
@@ -543,6 +584,7 @@ else:
from tokenize import generate_tokens as compat_tokenize_tokenize
__all__ = [
'compat_HTMLParser',
'compat_HTTPError',
'compat_basestring',
'compat_chr',
@@ -572,6 +614,7 @@ __all__ = [
'compat_urllib_parse_unquote',
'compat_urllib_parse_unquote_plus',
'compat_urllib_parse_unquote_to_bytes',
'compat_urllib_parse_urlencode',
'compat_urllib_parse_urlparse',
'compat_urllib_request',
'compat_urllib_request_DataHandler',
@@ -579,6 +622,7 @@ __all__ = [
'compat_urlparse',
'compat_urlretrieve',
'compat_xml_parse_error',
'compat_xpath',
'shlex_quote',
'subprocess_check_output',
'workaround_optparse_bug9161',

View File

@@ -115,6 +115,10 @@ class FileDownloader(object):
return '%10s' % '---b/s'
return '%10s' % ('%s/s' % format_bytes(speed))
@staticmethod
def format_retries(retries):
return 'inf' if retries == float('inf') else '%.0f' % retries
@staticmethod
def best_block_size(elapsed_time, bytes):
new_min = max(bytes / 2.0, 1.0)
@@ -297,7 +301,9 @@ class FileDownloader(object):
def report_retry(self, count, retries):
"""Report retry in case of HTTP error 5xx"""
self.to_screen('[download] Got server HTTP error. Retrying (attempt %d of %.0f)...' % (count, retries))
self.to_screen(
'[download] Got server HTTP error. Retrying (attempt %d of %s)...'
% (count, self.format_retries(retries)))
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""

View File

@@ -4,6 +4,7 @@ import os
import re
from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import (
sanitize_open,
encodeFilename,
@@ -36,20 +37,41 @@ class DashSegmentsFD(FragmentFD):
segments_filenames = []
def append_url_to_file(target_url, target_filename):
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
if not success:
fragment_retries = self.params.get('fragment_retries', 0)
def append_url_to_file(target_url, tmp_filename, segment_name):
target_filename = '%s-%s' % (tmp_filename, segment_name)
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
if not success:
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
ctx['dest_stream'].write(down.read())
down.close()
segments_filenames.append(target_sanitized)
break
except (compat_urllib_error.HTTPError, ) as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps
# is usually enough) thus allowing to download the whole file successfully.
# So, we will retry all fragments that fail with 404 HTTP error for now.
if err.code != 404:
raise
# Retry fragment
count += 1
if count <= fragment_retries:
self.report_retry_fragment(segment_name, count, fragment_retries)
if count > fragment_retries:
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
ctx['dest_stream'].write(down.read())
down.close()
segments_filenames.append(target_sanitized)
if initialization_url:
append_url_to_file(initialization_url, ctx['tmpfilename'] + '-Init')
append_url_to_file(initialization_url, ctx['tmpfilename'], 'Init')
for i, segment_url in enumerate(segment_urls):
segment_filename = '%s-Seg%d' % (ctx['tmpfilename'], i)
append_url_to_file(segment_url, segment_filename)
append_url_to_file(segment_url, ctx['tmpfilename'], 'Seg%d' % i)
self._finish_frag_download(ctx)

View File

@@ -198,12 +198,39 @@ class FFmpegFD(ExternalFD):
'-headers',
''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
protocol = info_dict.get('protocol')
if protocol == 'rtmp':
player_url = info_dict.get('player_url')
page_url = info_dict.get('page_url')
app = info_dict.get('app')
play_path = info_dict.get('play_path')
tc_url = info_dict.get('tc_url')
flash_version = info_dict.get('flash_version')
live = info_dict.get('rtmp_live', False)
if player_url is not None:
args += ['-rtmp_swfverify', player_url]
if page_url is not None:
args += ['-rtmp_pageurl', page_url]
if app is not None:
args += ['-rtmp_app', app]
if play_path is not None:
args += ['-rtmp_playpath', play_path]
if tc_url is not None:
args += ['-rtmp_tcurl', tc_url]
if flash_version is not None:
args += ['-rtmp_flashver', flash_version]
if live:
args += ['-rtmp_live', 'live']
args += ['-i', url, '-c', 'copy']
if info_dict.get('protocol') == 'm3u8':
if protocol == 'm3u8':
if self.params.get('hls_use_mpegts', False):
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
elif protocol == 'rtmp':
args += ['-f', 'flv']
else:
args += ['-f', EXT_TO_OUT_FORMATS.get(info_dict['ext'], info_dict['ext'])]

View File

@@ -223,6 +223,12 @@ def write_metadata_tag(stream, metadata):
write_unsigned_int(stream, FLV_TAG_HEADER_LEN + len(metadata))
def remove_encrypted_media(media):
return list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and
'drmAdditionalHeaderSetId' not in e.attrib,
media))
def _add_ns(prop):
return '{http://ns.adobe.com/f4m/1.0}%s' % prop
@@ -244,9 +250,7 @@ class F4mFD(FragmentFD):
# without drmAdditionalHeaderId or drmAdditionalHeaderSetId attribute
if 'id' not in e.attrib:
self.report_error('Missing ID in f4m DRM')
media = list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and
'drmAdditionalHeaderSetId' not in e.attrib,
media))
media = remove_encrypted_media(media)
if not media:
self.report_error('Unsupported DRM')
return media

View File

@@ -19,8 +19,17 @@ class HttpQuietDownloader(HttpFD):
class FragmentFD(FileDownloader):
"""
A base file downloader class for fragmented media (e.g. f4m/m3u8 manifests).
Available options:
fragment_retries: Number of times to retry a fragment for HTTP error (DASH only)
"""
def report_retry_fragment(self, fragment_name, count, retries):
self.to_screen(
'[download] Got server HTTP error. Retrying fragment %s (attempt %d of %s)...'
% (fragment_name, count, self.format_retries(retries)))
def _prepare_and_start_frag_download(self, ctx):
self._prepare_frag_download(ctx)
self._start_frag_download(ctx)

View File

@@ -72,6 +72,7 @@ from .bet import BetIE
from .bigflix import BigflixIE
from .bild import BildIE
from .bilibili import BiliBiliIE
from .biobiochiletv import BioBioChileTVIE
from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
@@ -81,6 +82,7 @@ from .bloomberg import BloombergIE
from .bokecc import BokeCCIE
from .bpb import BpbIE
from .br import BRIE
from .bravotv import BravoTVIE
from .breakcom import BreakIE
from .brightcove import (
BrightcoveLegacyIE,
@@ -93,6 +95,7 @@ from .camdemy import (
CamdemyIE,
CamdemyFolderIE
)
from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canvas import CanvasIE
@@ -101,12 +104,14 @@ from .cbc import (
CBCPlayerIE,
)
from .cbs import CBSIE
from .cbsinteractive import CBSInteractiveIE
from .cbsnews import (
CBSNewsIE,
CBSNewsLiveVideoIE,
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
from .chaturbate import ChaturbateIE
@@ -124,7 +129,7 @@ from .cloudy import CloudyIE
from .clubic import ClubicIE
from .clyp import ClypIE
from .cmt import CMTIE
from .cnet import CNETIE
from .cnbc import CNBCIE
from .cnn import (
CNNIE,
CNNBlogsIE,
@@ -135,6 +140,7 @@ from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .crackle import CrackleIE
@@ -282,6 +288,7 @@ from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hbo import HBOIE
from .hearthisat import HearThisAtIE
from .heise import HeiseIE
from .hellporno import HellPornoIE
@@ -405,6 +412,7 @@ from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixcloud import MixcloudIE
from .mlb import MLBIE
from .mnet import MnetIE
from .mpora import MporaIE
from .moevideo import MoeVideoIE
from .mofosex import MofosexIE
@@ -431,10 +439,14 @@ from .myspass import MySpassIE
from .myvi import MyviIE
from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE
from .nationalgeographic import NationalGeographicIE
from .nationalgeographic import (
NationalGeographicIE,
NationalGeographicChannelIE,
)
from .naver import NaverIE
from .nba import NBAIE
from .nbc import (
CSNNEIE,
NBCIE,
NBCNewsIE,
NBCSportsIE,
@@ -484,11 +496,11 @@ from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
from .nova import NovaIE
from .novamov import (
NovaMovIE,
WholeCloudIE,
AuroraVidIE,
CloudTimeIE,
NowVideoIE,
VideoWeedIE,
CloudTimeIE,
WholeCloudIE,
)
from .nowness import (
NownessIE,
@@ -530,6 +542,7 @@ from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .openload import OpenloadIE
from .ora import OraTVIE
from .orf import (
ORFTVthekIE,
@@ -598,6 +611,7 @@ from .regiotv import RegioTVIE
from .restudy import RestudyIE
from .reverbnation import ReverbNationIE
from .revision3 import Revision3IE
from .rice import RICEIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
@@ -624,6 +638,7 @@ from .ruutu import RuutuIE
from .sandia import SandiaIE
from .safari import (
SafariIE,
SafariApiIE,
SafariCourseIE,
)
from .sapo import SapoIE
@@ -726,7 +741,6 @@ from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .teletask import TeleTaskIE
from .tenplay import TenPlayIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .theintercept import TheInterceptIE
@@ -735,7 +749,9 @@ from .theplatform import (
ThePlatformIE,
ThePlatformFeedIE,
)
from .thescene import TheSceneIE
from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
@@ -782,6 +798,7 @@ from .tv2 import (
TV2IE,
TV2ArticleIE,
)
from .tv3 import TV3IE
from .tv4 import TV4IE
from .tvc import (
TVCIE,
@@ -888,6 +905,7 @@ from .vk import (
from .vlive import VLiveIE
from .vodlocker import VodlockerIE
from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE
from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
@@ -949,7 +967,9 @@ from .youtube import (
YoutubeChannelIE,
YoutubeFavouritesIE,
YoutubeHistoryIE,
YoutubeLiveIE,
YoutubePlaylistIE,
YoutubePlaylistsIE,
YoutubeRecommendedIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
@@ -959,7 +979,6 @@ from .youtube import (
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeUserIE,
YoutubePlaylistsIE,
YoutubeWatchLaterIE,
)
from .zapiks import ZapiksIE

View File

@@ -12,7 +12,7 @@ from ..utils import (
class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au'
_VALID_URL = r'http://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',

View File

@@ -44,6 +44,7 @@ class Abc7NewsIE(InfoExtractor):
'contentURL', webpage, 'm3u8 url', fatal=True)
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats)
title = self._og_search_title(webpage).strip()
description = self._og_search_description(webpage).strip()

View File

@@ -6,7 +6,7 @@ from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
)
from ..utils import (
@@ -16,7 +16,7 @@ from ..utils import (
class AddAnimeIE(InfoExtractor):
_VALID_URL = r'http://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
_VALID_URL = r'https?://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
_TESTS = [{
'url': 'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
'md5': '72954ea10bc979ab5e2eb288b21425a0',
@@ -60,7 +60,7 @@ class AddAnimeIE(InfoExtractor):
confirm_url = (
parsed_url.scheme + '://' + parsed_url.netloc +
action + '?' +
compat_urllib_parse.urlencode({
compat_urllib_parse_urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage(
confirm_url, video_id,

View File

@@ -1,13 +1,19 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import smuggle_url
from ..utils import (
smuggle_url,
update_url_query,
unescapeHTML,
)
class AENetworksIE(InfoExtractor):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?P<type>[^/]+)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
@@ -16,6 +22,9 @@ class AENetworksIE(InfoExtractor):
'ext': 'mp4',
'title': "Bet You Didn't Know: Valentine's Day",
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
'timestamp': 1375819729,
'upload_date': '20130806',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
@@ -25,15 +34,15 @@ class AENetworksIE(InfoExtractor):
'expected_warnings': ['JSON-LD'],
}, {
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
'info_dict': {
'id': 'eg47EERs_JsZ',
'ext': 'mp4',
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
},
'params': {
# m3u8 download
'skip_download': True,
'timestamp': 1338306241,
'upload_date': '20120529',
'uploader': 'AENE-NEW',
},
'add_ie': ['ThePlatform'],
}, {
@@ -48,7 +57,7 @@ class AENetworksIE(InfoExtractor):
}]
def _real_extract(self, url):
video_id = self._match_id(url)
page_type, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
@@ -56,11 +65,23 @@ class AENetworksIE(InfoExtractor):
r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
r"media_url\s*=\s*'([^']+)'"
]
video_url = self._search_regex(video_url_re, webpage, 'video url')
video_url = unescapeHTML(self._search_regex(video_url_re, webpage, 'video url'))
query = {'mbr': 'true'}
if page_type == 'shows':
query['assetTypes'] = 'medium_video_s3'
if 'switch=hds' in video_url:
query['switch'] = 'hls'
info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({
'_type': 'url_transparent',
'url': smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}),
'url': smuggle_url(
update_url_query(video_url, query),
{
'sig': {
'key': 'crazyjava',
'secret': 's3cr3t'},
'force_smil_url': True
}),
})
return info

View File

@@ -6,7 +6,7 @@ from ..utils import int_or_none
class AftonbladetIE(InfoExtractor):
_VALID_URL = r'http://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'info_dict': {

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'http://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',
@@ -13,24 +13,18 @@ class AlJazeeraIE(InfoExtractor):
'ext': 'mp4',
'title': 'The Slum - Episode 1: Deliverance',
'description': 'As a birth attendant advocating for family planning, Remy is on the frontline of Tondo\'s battle with overcrowding.',
'uploader': 'Al Jazeera English',
'uploader_id': '665003303001',
'timestamp': 1411116829,
'upload_date': '20140919',
},
'add_ie': ['BrightcoveLegacy'],
'add_ie': ['BrightcoveNew'],
'skip': 'Not accessible from Travis CI server',
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/665003303001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
program_name = self._match_id(url)
webpage = self._download_webpage(url, program_name)
brightcove_id = self._search_regex(
r'RenderPagesVideo\(\'(.+?)\'', webpage, 'brightcove id')
return {
'_type': 'url',
'url': (
'brightcove:'
'playerKey=AQ~~%2CAAAAmtVJIFk~%2CTVGOQ5ZTwJbeMWnq5d_H4MOM57xfzApc'
'&%40videoPlayer={0}'.format(brightcove_id)
),
'ie_key': 'BrightcoveLegacy',
}
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)

View File

@@ -69,12 +69,14 @@ class AMPIE(InfoExtractor):
self._sort_formats(formats)
timestamp = parse_iso8601(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
return {
'id': video_id,
'title': get_media_node('title'),
'description': get_media_node('description'),
'thumbnails': thumbnails,
'timestamp': parse_iso8601(item.get('pubDate'), ' '),
'timestamp': timestamp,
'duration': int_or_none(media_content[0].get('@attributes', {}).get('duration')),
'subtitles': subtitles,
'formats': formats,

View File

@@ -3,10 +3,13 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_urlparse,
compat_str,
)
from ..utils import (
determine_ext,
encode_dict,
extract_attributes,
ExtractorError,
sanitized_Request,
urlencode_postdata,
@@ -18,7 +21,7 @@ class AnimeOnDemandIE(InfoExtractor):
_LOGIN_URL = 'https://www.anime-on-demand.de/users/sign_in'
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand'
_TEST = {
_TESTS = [{
'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': {
'id': '161',
@@ -26,7 +29,19 @@ class AnimeOnDemandIE(InfoExtractor):
'description': 'md5:6681ce3c07c7189d255ac6ab23812d31',
},
'playlist_mincount': 4,
}
}, {
# Film wording is used instead of Episode
'url': 'https://www.anime-on-demand.de/anime/39',
'only_matching': True,
}, {
# Episodes without titles
'url': 'https://www.anime-on-demand.de/anime/162',
'only_matching': True,
}, {
# ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/169',
'only_matching': True,
}]
def _login(self):
(username, password) = self._get_login_info()
@@ -36,6 +51,10 @@ class AnimeOnDemandIE(InfoExtractor):
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
if '>Our licensing terms allow the distribution of animes only to German-speaking countries of Europe' in login_page:
self.raise_geo_restricted(
'%s is only available in German-speaking countries of Europe' % self.IE_NAME)
login_form = self._form_hidden_inputs('new_user', login_page)
login_form.update({
@@ -51,7 +70,7 @@ class AnimeOnDemandIE(InfoExtractor):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
request = sanitized_Request(
post_url, urlencode_postdata(encode_dict(login_form)))
post_url, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
@@ -91,14 +110,22 @@ class AnimeOnDemandIE(InfoExtractor):
entries = []
for episode_html in re.findall(r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage):
m = re.search(
r'class="episodebox-title"[^>]+title="Episode (?P<number>\d+) - (?P<title>.+?)"', episode_html)
if not m:
for num, episode_html in enumerate(re.findall(
r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1):
episodebox_title = self._search_regex(
(r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue
episode_number = int(m.group('number'))
episode_title = m.group('title')
episode_number = int(self._search_regex(
r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number
common_info = {
@@ -110,33 +137,86 @@ class AnimeOnDemandIE(InfoExtractor):
formats = []
playlist_url = self._search_regex(
r'data-playlist=(["\'])(?P<url>.+?)\1',
episode_html, 'data playlist', default=None, group='url')
if playlist_url:
request = sanitized_Request(
compat_urlparse.urljoin(url, playlist_url),
headers={
'X-Requested-With': 'XMLHttpRequest',
'X-CSRF-Token': csrf_token,
'Referer': url,
'Accept': 'application/json, text/javascript, */*; q=0.01',
})
for input_ in re.findall(
r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html):
attributes = extract_attributes(input_)
playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'):
playlist_url = attributes.get(playlist_key)
if isinstance(playlist_url, compat_str) and re.match(
r'/?[\da-zA-Z]+', playlist_url):
playlist_urls.append(attributes[playlist_key])
if not playlist_urls:
continue
playlist = self._download_json(
request, video_id, 'Downloading playlist JSON', fatal=False)
if playlist:
playlist = playlist['playlist'][0]
title = playlist['title']
lang = attributes.get('data-lang')
lang_note = attributes.get('value')
for playlist_url in playlist_urls:
kind = self._search_regex(
r'videomaterialurl/\d+/([^/]+)/',
playlist_url, 'media kind', default=None)
format_id_list = []
if lang:
format_id_list.append(lang)
if kind:
format_id_list.append(kind)
if not format_id_list:
format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note)))
request = sanitized_Request(
compat_urlparse.urljoin(url, playlist_url),
headers={
'X-Requested-With': 'XMLHttpRequest',
'X-CSRF-Token': csrf_token,
'Referer': url,
'Accept': 'application/json, text/javascript, */*; q=0.01',
})
playlist = self._download_json(
request, video_id, 'Downloading %s playlist JSON' % format_id,
fatal=False)
if not playlist:
continue
start_video = playlist.get('startvideo', 0)
playlist = playlist.get('playlist')
if not playlist or not isinstance(playlist, list):
continue
playlist = playlist[start_video]
title = playlist.get('title')
if not title:
continue
description = playlist.get('description')
for source in playlist.get('sources', []):
file_ = source.get('file')
if file_ and determine_ext(file_) == 'm3u8':
formats = self._extract_m3u8_formats(
if not file_:
continue
ext = determine_ext(file_)
format_id_list = [lang, kind]
if ext == 'm3u8':
format_id_list.append('hls')
elif source.get('type') == 'video/dash' or ext == 'mpd':
format_id_list.append('dash')
format_id = '-'.join(filter(None, format_id_list))
if ext == 'm3u8':
file_formats = self._extract_m3u8_formats(
file_, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
entry_protocol='m3u8_native', m3u8_id=format_id, fatal=False)
elif source.get('type') == 'video/dash' or ext == 'mpd':
continue
file_formats = self._extract_mpd_formats(
file_, video_id, mpd_id=format_id, fatal=False)
else:
continue
for f in file_formats:
f.update({
'language': lang,
'format_note': format_note,
})
formats.extend(file_formats)
if formats:
self._sort_formats(formats)
f = common_info.copy()
f.update({
'title': title,
@@ -145,16 +225,18 @@ class AnimeOnDemandIE(InfoExtractor):
})
entries.append(f)
m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
episode_html)
if m:
f = common_info.copy()
f.update({
'id': '%s-teaser' % f['id'],
'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')),
})
entries.append(f)
# Extract teaser only when full episode is not available
if not formats:
m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
episode_html)
if m:
f = common_info.copy()
f.update({
'id': '%s-teaser' % f['id'],
'title': m.group('title'),
'url': compat_urlparse.urljoin(url, m.group('href')),
})
entries.append(f)
return self.playlist_result(entries, anime_id, anime_title, anime_description)

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com'
_VALID_URL = r'(?:aol-video:|http://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
_TESTS = [{
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
@@ -25,7 +25,7 @@ class AolIE(InfoExtractor):
class AolFeaturesIE(InfoExtractor):
IE_NAME = 'features.aol.com'
_VALID_URL = r'http://features\.aol\.com/video/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://features\.aol\.com/video/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://features.aol.com/video/behind-secret-second-careers-late-night-talk-show-hosts',

View File

@@ -23,7 +23,7 @@ from ..utils import (
class ArteTvIE(InfoExtractor):
_VALID_URL = r'http://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
_VALID_URL = r'https?://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
IE_NAME = 'arte.tv'
def _real_extract(self, url):

View File

@@ -6,16 +6,14 @@ import hashlib
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse,
)
from ..compat import compat_str
from ..utils import (
int_or_none,
float_or_none,
sanitized_Request,
xpath_text,
ExtractorError,
float_or_none,
int_or_none,
sanitized_Request,
urlencode_postdata,
xpath_text,
)
@@ -86,7 +84,7 @@ class AtresPlayerIE(InfoExtractor):
}
request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage(
request, None, 'Logging in as %s' % username)

View File

@@ -98,7 +98,7 @@ class AzubuIE(InfoExtractor):
class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'http://www.azubu.tv/(?P<id>[^/]+)$'
_VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$'
_TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen',
@@ -120,6 +120,7 @@ class AzubuLiveIE(InfoExtractor):
bc_info = self._download_json(req, user)
m3u8_url = next(source['src'] for source in bc_info['sources'] if source['container'] == 'M2TS')
formats = self._extract_m3u8_formats(m3u8_url, user, ext='mp4')
self._sort_formats(formats)
return {
'id': info['id'],

View File

@@ -9,7 +9,7 @@ from ..utils import unescapeHTML
class BaiduVideoIE(InfoExtractor):
IE_DESC = '百度视频'
_VALID_URL = r'http://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
_VALID_URL = r'https?://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
_TESTS = [{
'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6',
'info_dict': {

View File

@@ -4,15 +4,13 @@ import re
import itertools
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_str,
)
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
float_or_none,
int_or_none,
sanitized_Request,
urlencode_postdata,
)
@@ -58,7 +56,7 @@ class BambuserIE(InfoExtractor):
}
request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage(
request, None, 'Logging in as %s' % username)

View File

@@ -328,6 +328,7 @@ class BBCCoUkIE(InfoExtractor):
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
'vcodec': 'none',
})
formats.extend(conn_formats)
return formats
@@ -688,6 +689,10 @@ class BBCIE(BBCCoUkIE):
# custom redirection to www.bbc.com
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True,
}, {
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
'only_matching': True,
}]
@classmethod
@@ -817,7 +822,7 @@ class BBCIE(BBCCoUkIE):
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
programme_id = self._search_regex(
[r'data-video-player-vpid="(%s)"' % self._ID_REGEX,
[r'data-(?:video-player|media)-vpid="(%s)"' % self._ID_REGEX,
r'<param[^>]+name="externalIdentifier"[^>]+value="(%s)"' % self._ID_REGEX,
r'videoId\s*:\s*["\'](%s)["\']' % self._ID_REGEX],
webpage, 'vpid', default=None)
@@ -942,7 +947,7 @@ class BBCIE(BBCCoUkIE):
class BBCCoUkArticleIE(InfoExtractor):
_VALID_URL = 'http://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
IE_NAME = 'bbc.co.uk:article'
IE_DESC = 'BBC articles'

View File

@@ -34,7 +34,7 @@ class BeegIE(InfoExtractor):
video_id = self._match_id(url)
video = self._download_json(
'https://api.beeg.com/api/v5/video/%s' % video_id, video_id)
'https://api.beeg.com/api/v6/1738/video/%s' % video_id, video_id)
def split(o, e):
def cut(s, x):
@@ -50,8 +50,8 @@ class BeegIE(InfoExtractor):
return n
def decrypt_key(key):
# Reverse engineered from http://static.beeg.com/cpl/1105.js
a = '5ShMcIQlssOd7zChAIOlmeTZDaUxULbJRnywYaiB'
# Reverse engineered from http://static.beeg.com/cpl/1738.js
a = 'GUuyodcfS8FW8gQp4OKLMsZBcX0T7B'
e = compat_urllib_parse_unquote(key)
o = ''.join([
compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)

View File

@@ -8,7 +8,7 @@ from ..utils import url_basename
class BehindKinkIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?behindkink\.com/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<id>[^/#?_]+)'
_VALID_URL = r'https?://(?:www\.)?behindkink\.com/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<id>[^/#?_]+)'
_TEST = {
'url': 'http://www.behindkink.com/2014/12/05/what-are-you-passionate-about-marley-blaze/',
'md5': '507b57d8fdcd75a41a9a7bdb7989c762',

View File

@@ -94,6 +94,7 @@ class BetIE(InfoExtractor):
xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
formats = self._extract_smil_formats(smil_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -14,7 +14,7 @@ from ..utils import (
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
_VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',

View File

@@ -0,0 +1,86 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
class BioBioChileTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_TESTS = [{
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
'md5': '26f51f03cf580265defefb4518faec09',
'info_dict': {
'id': 'sobre-camaras-y-camarillas-parlamentarias',
'ext': 'mp4',
'title': 'Sobre Cámaras y camarillas parlamentarias',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Fernando Atria',
},
}, {
# different uploader layout
'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
'md5': 'edc2e6b58974c46d5b047dea3c539ff3',
'info_dict': {
'id': 'natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades',
'ext': 'mp4',
'title': 'Natalia Valdebenito repasa a diputado Hasbún: Pasó a la categoría de hablar brutalidades',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Piangella Obrador',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
'only_matching': True,
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/exclusivo-hector-pinto-formador-de-chupete-revela-version-del-ex-delantero-albo.shtml',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
file_url = self._search_regex(
r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
webpage, 'file url', group='url')
base_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
group='url')
formats = self._extract_m3u8_formats(
'%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
f = {
'url': '%s%s' % (base_url, file_url),
'format_id': 'http',
'protocol': 'http',
'preference': 1,
}
if formats:
f_copy = formats[-1].copy()
f_copy.update(f)
f = f_copy
formats.append(f)
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'formats': formats,
}

View File

@@ -33,7 +33,7 @@ class BokeCCBaseIE(InfoExtractor):
class BokeCCIE(BokeCCBaseIE):
_IE_DESC = 'CC视频'
_VALID_URL = r'http://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_TESTS = [{
'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B',

View File

@@ -12,7 +12,7 @@ from ..utils import (
class BpbIE(InfoExtractor):
IE_DESC = 'Bundeszentrale für politische Bildung'
_VALID_URL = r'http://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
_VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
_TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',

View File

@@ -0,0 +1,31 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class BravoTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+videos/(?P<id>[^/?]+)'
_TEST = {
'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
'md5': 'd60cdf68904e854fac669bd26cccf801',
'info_dict': {
'id': 'LitrBdX64qLn',
'ext': 'mp4',
'title': 'Last Chance Kitchen Returns',
'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13',
'timestamp': 1448926740,
'upload_date': '20151130',
'uploader': 'NBCU-BRAV',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
account_pid = self._search_regex(r'"account_pid"\s*:\s*"([^"]+)"', webpage, 'account pid')
release_pid = self._search_regex(r'"release_pid"\s*:\s*"([^"]+)"', webpage, 'release pid')
return self.url_result(smuggle_url(
'http://link.theplatform.com/s/%s/%s?mbr=true&switch=progressive' % (account_pid, release_pid),
{'force_smil_url': True}), 'ThePlatform', release_pid)

View File

@@ -11,7 +11,7 @@ from ..utils import (
class BreakIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056',
'info_dict': {

View File

@@ -9,10 +9,10 @@ from ..compat import (
compat_etree_fromstring,
compat_parse_qs,
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse,
compat_urlparse,
compat_xml_parse_error,
compat_HTTPError,
)
from ..utils import (
determine_ext,
@@ -23,16 +23,16 @@ from ..utils import (
js_to_json,
int_or_none,
parse_iso8601,
sanitized_Request,
unescapeHTML,
unsmuggle_url,
update_url_query,
)
class BrightcoveLegacyIE(InfoExtractor):
IE_NAME = 'brightcove:legacy'
_VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s'
_FEDERATED_URL = 'http://c.brightcove.com/services/viewer/htmlFederated'
_TESTS = [
{
@@ -46,6 +46,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
'uploader': '8TV',
'description': 'md5:a950cc4285c43e44d763d036710cd9cd',
'timestamp': 1368213670,
'upload_date': '20130510',
'uploader_id': '1589608506001',
}
},
{
@@ -57,6 +60,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
'description': 'John Rose speaks at the JVM Language Summit, August 1, 2012.',
'uploader': 'Oracle',
'timestamp': 1344975024,
'upload_date': '20120814',
'uploader_id': '1460825906',
},
},
{
@@ -68,6 +74,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'This Bracelet Acts as a Personal Thermostat',
'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
'uploader': 'Mashable',
'timestamp': 1382041798,
'upload_date': '20131017',
'uploader_id': '1130468786001',
},
},
{
@@ -85,14 +94,17 @@ class BrightcoveLegacyIE(InfoExtractor):
{
# test flv videos served by akamaihd.net
# From http://www.redbull.com/en/bike/stories/1331655643987/replay-uci-dh-world-cup-2014-from-fort-william
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3ABC2996102916001&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3Aevent-stream-356&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
# The md5 checksum changes on each download
'info_dict': {
'id': '2996102916001',
'id': '3750436379001',
'ext': 'flv',
'title': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'uploader': 'Red Bull TV',
'uploader': 'RBTV Old (do not use)',
'description': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'timestamp': 1409122195,
'upload_date': '20140827',
'uploader_id': '710858724001',
},
},
{
@@ -106,6 +118,12 @@ class BrightcoveLegacyIE(InfoExtractor):
'playlist_mincount': 7,
},
]
FLV_VCODECS = {
1: 'SORENSON',
2: 'ON2',
3: 'H264',
4: 'VP8',
}
@classmethod
def _build_brighcove_url(cls, object_str):
@@ -136,13 +154,16 @@ class BrightcoveLegacyIE(InfoExtractor):
else:
flashvars = {}
data_url = object_doc.attrib.get('data', '')
data_url_params = compat_parse_qs(compat_urllib_parse_urlparse(data_url).query)
def find_param(name):
if name in flashvars:
return flashvars[name]
node = find_xpath_attr(object_doc, './param', 'name', name)
if node is not None:
return node.attrib['value']
return None
return data_url_params.get(name)
params = {}
@@ -155,8 +176,8 @@ class BrightcoveLegacyIE(InfoExtractor):
# Not all pages define this value
if playerKey is not None:
params['playerKey'] = playerKey
# The three fields hold the id of the video
videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID')
# These fields hold the id of the video
videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID') or find_param('@videoList')
if videoPlayer is not None:
params['@videoPlayer'] = videoPlayer
linkBase = find_param('linkBaseURL')
@@ -184,8 +205,7 @@ class BrightcoveLegacyIE(InfoExtractor):
@classmethod
def _make_brightcove_url(cls, params):
data = compat_urllib_parse.urlencode(params)
return cls._FEDERATED_URL_TEMPLATE % data
return update_url_query(cls._FEDERATED_URL, params)
@classmethod
def _extract_brightcove_url(cls, webpage):
@@ -239,7 +259,7 @@ class BrightcoveLegacyIE(InfoExtractor):
# We set the original url as the default 'Referer' header
referer = smuggled_data.get('Referer', url)
return self._get_video_info(
videoPlayer[0], query_str, query, referer=referer)
videoPlayer[0], query, referer=referer)
elif 'playerKey' in query:
player_key = query['playerKey']
return self._get_playlist_info(player_key[0])
@@ -248,15 +268,14 @@ class BrightcoveLegacyIE(InfoExtractor):
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True)
def _get_video_info(self, video_id, query_str, query, referer=None):
request_url = self._FEDERATED_URL_TEMPLATE % query_str
req = sanitized_Request(request_url)
def _get_video_info(self, video_id, query, referer=None):
headers = {}
linkBase = query.get('linkBaseURL')
if linkBase is not None:
referer = linkBase[0]
if referer is not None:
req.add_header('Referer', referer)
webpage = self._download_webpage(req, video_id)
headers['Referer'] = referer
webpage = self._download_webpage(self._FEDERATED_URL, video_id, headers=headers, query=query)
error_msg = self._html_search_regex(
r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
@@ -288,15 +307,19 @@ class BrightcoveLegacyIE(InfoExtractor):
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
publisher_id = video_info.get('publisherId')
info = {
'id': compat_str(video_info['id']),
'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
'uploader_id': compat_str(publisher_id) if publisher_id else None,
'duration': float_or_none(video_info.get('length'), 1000),
'timestamp': int_or_none(video_info.get('creationDate'), 1000),
}
renditions = video_info.get('renditions')
renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
if renditions:
formats = []
for rend in renditions:
@@ -317,19 +340,42 @@ class BrightcoveLegacyIE(InfoExtractor):
ext = 'flv'
if ext is None:
ext = determine_ext(url)
size = rend.get('size')
formats.append({
tbr = int_or_none(rend.get('encodingRate'), 1000),
a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url,
'ext': ext,
'height': rend.get('frameHeight'),
'width': rend.get('frameWidth'),
'filesize': size if size != 0 else None,
})
'filesize': int_or_none(rend.get('size')) or None,
'tbr': tbr,
}
if rend.get('audioOnly'):
a_format.update({
'vcodec': 'none',
})
else:
a_format.update({
'height': int_or_none(rend.get('frameHeight')),
'width': int_or_none(rend.get('frameWidth')),
'vcodec': rend.get('videoCodec'),
})
# m3u8 manifests with remote == false are media playlists
# Not calling _extract_m3u8_formats here to save network traffic
if ext == 'm3u8':
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8',
})
formats.append(a_format)
self._sort_formats(formats)
info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
'filesize': int_or_none(video_info.get('FLVFullSize')),
})
if self._downloader.params.get('include_ads', False):
@@ -355,7 +401,7 @@ class BrightcoveLegacyIE(InfoExtractor):
class BrightcoveNewIE(InfoExtractor):
IE_NAME = 'brightcove:new'
_VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>(?:ref:)?\d+)'
_VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>\d+|ref:[^&]+)'
_TESTS = [{
'url': 'http://players.brightcove.net/929656772001/e41d32dc-ec74-459e-a845-6c69f7b724ea_default/index.html?videoId=4463358922001',
'md5': 'c8100925723840d4b0d243f7025703be',
@@ -385,12 +431,17 @@ class BrightcoveNewIE(InfoExtractor):
'formats': 'mincount:41',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
# ref: prefixed video id
'url': 'http://players.brightcove.net/3910869709001/21519b5c-4b3b-4363-accb-bdc8f358f823_default/index.html?videoId=ref:7069442',
'only_matching': True,
}, {
# non numeric ref: prefixed video id
'url': 'http://players.brightcove.net/710858724001/default_default/index.html?videoId=ref:event-stream-356',
'only_matching': True,
}]
@staticmethod
@@ -410,8 +461,8 @@ class BrightcoveNewIE(InfoExtractor):
# Look for iframe embeds [1]
for _, url in re.findall(
r'<iframe[^>]+src=(["\'])((?:https?:)//players\.brightcove\.net/\d+/[^/]+/index\.html.+?)\1', webpage):
entries.append(url)
r'<iframe[^>]+src=(["\'])((?:https?:)?//players\.brightcove\.net/\d+/[^/]+/index\.html.+?)\1', webpage):
entries.append(url if url.startswith('http') else 'http:' + url)
# Look for embed_in_page embeds [2]
for video_id, account_id, player_id, embed in re.findall(
@@ -420,11 +471,11 @@ class BrightcoveNewIE(InfoExtractor):
# According to [4] data-video-id may be prefixed with ref:
r'''(?sx)
<video[^>]+
data-video-id=["\']((?:ref:)?\d+)["\'][^>]*>.*?
data-video-id=["\'](\d+|ref:[^"\']+)["\'][^>]*>.*?
</video>.*?
<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/
(\d+)/([\da-f-]+)_([^/]+)/index\.min\.js
(\d+)/([^/]+)_([^/]+)/index(?:\.min)?\.js
''', webpage):
entries.append(
'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
@@ -454,24 +505,33 @@ class BrightcoveNewIE(InfoExtractor):
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
req = sanitized_Request(
'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s'
% (account_id, video_id),
headers={'Accept': 'application/json;pk=%s' % policy_key})
json_data = self._download_json(req, video_id)
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id)
try:
json_data = self._download_json(api_url, video_id, headers={
'Accept': 'application/json;pk=%s' % policy_key
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)
raise ExtractorError(json_data[0]['message'], expected=True)
raise
title = json_data['name']
title = json_data['name'].strip()
formats = []
for source in json_data.get('sources', []):
container = source.get('container')
source_type = source.get('type')
src = source.get('src')
if source_type == 'application/x-mpegURL':
if source_type == 'application/x-mpegURL' or container == 'M2TS':
if not src:
continue
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
src, video_id, 'mp4', m3u8_id='hls', fatal=False))
elif source_type == 'application/dash+xml':
if not src:
continue
formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False))
else:
streaming_src = source.get('streaming_src')
stream_name, app_name = source.get('stream_name'), source.get('app_name')
@@ -479,15 +539,23 @@ class BrightcoveNewIE(InfoExtractor):
continue
tbr = float_or_none(source.get('avg_bitrate'), 1000)
height = int_or_none(source.get('height'))
width = int_or_none(source.get('width'))
f = {
'tbr': tbr,
'width': int_or_none(source.get('width')),
'height': height,
'filesize': int_or_none(source.get('size')),
'container': source.get('container'),
'vcodec': source.get('codec'),
'ext': source.get('container').lower(),
'container': container,
'ext': container.lower(),
}
if width == 0 and height == 0:
f.update({
'vcodec': 'none',
})
else:
f.update({
'width': width,
'height': height,
'vcodec': source.get('codec'),
})
def build_format_id(kind):
format_id = kind
@@ -501,7 +569,7 @@ class BrightcoveNewIE(InfoExtractor):
f.update({
'url': src or streaming_src,
'format_id': build_format_id('http' if src else 'http-streaming'),
'preference': 2 if src else 1,
'source_preference': 0 if src else -1,
})
else:
f.update({
@@ -512,20 +580,22 @@ class BrightcoveNewIE(InfoExtractor):
formats.append(f)
self._sort_formats(formats)
description = json_data.get('description')
thumbnail = json_data.get('thumbnail')
timestamp = parse_iso8601(json_data.get('published_at'))
duration = float_or_none(json_data.get('duration'), 1000)
tags = json_data.get('tags', [])
subtitles = {}
for text_track in json_data.get('text_tracks', []):
if text_track.get('src'):
subtitles.setdefault(text_track.get('srclang'), []).append({
'url': text_track['src'],
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'description': json_data.get('description'),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000),
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id,
'formats': formats,
'tags': tags,
'subtitles': subtitles,
'tags': json_data.get('tags', []),
}

View File

@@ -6,7 +6,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
@@ -16,7 +16,7 @@ from ..utils import (
class CamdemyIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?camdemy\.com/media/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?camdemy\.com/media/(?P<id>\d+)'
_TESTS = [{
# single file
'url': 'http://www.camdemy.com/media/5181/',
@@ -104,7 +104,7 @@ class CamdemyIE(InfoExtractor):
class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'http://www.camdemy.com/folder/(?P<id>\d+)'
_VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)'
_TESTS = [{
# links with trailing slash
'url': 'http://www.camdemy.com/folder/450',
@@ -139,7 +139,7 @@ class CamdemyFolderIE(InfoExtractor):
parsed_url = list(compat_urlparse.urlparse(url))
query = dict(compat_urlparse.parse_qsl(parsed_url[4]))
query.update({'displayMode': 'list'})
parsed_url[4] = compat_urllib_parse.urlencode(query)
parsed_url[4] = compat_urllib_parse_urlencode(query)
final_url = compat_urlparse.urlunparse(parsed_url)
page = self._download_webpage(final_url, folder_id)

View File

@@ -0,0 +1,87 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
unified_strdate,
)
class CamWithHerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?camwithher\.tv/view_video\.php\?.*\bviewkey=(?P<id>\w+)'
_TESTS = [{
'url': 'http://camwithher.tv/view_video.php?viewkey=6e9a24e2c0e842e1f177&page=&viewtype=&category=',
'info_dict': {
'id': '5644',
'ext': 'flv',
'title': 'Periscope Tease',
'description': 'In the clouds teasing on periscope to my favorite song',
'duration': 240,
'view_count': int,
'comment_count': int,
'uploader': 'MileenaK',
'upload_date': '20160322',
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://camwithher.tv/view_video.php?viewkey=6dfd8b7c97531a459937',
'only_matching': True,
}, {
'url': 'http://camwithher.tv/view_video.php?page=&viewkey=6e9a24e2c0e842e1f177&viewtype=&category=',
'only_matching': True,
}, {
'url': 'http://camwithher.tv/view_video.php?viewkey=b6c3b5bea9515d1a1fc4&page=&viewtype=&category=mv',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
flv_id = self._html_search_regex(
r'<a[^>]+href=["\']/download/\?v=(\d+)', webpage, 'video id')
# Video URL construction algorithm is reverse-engineered from cwhplayer.swf
rtmp_url = 'rtmp://camwithher.tv/clipshare/%s' % (
('mp4:%s.mp4' % flv_id) if int(flv_id) > 2010 else flv_id)
title = self._html_search_regex(
r'<div[^>]+style="float:left"[^>]*>\s*<h2>(.+?)</h2>', webpage, 'title')
description = self._html_search_regex(
r'>Description:</span>(.+?)</div>', webpage, 'description', default=None)
runtime = self._search_regex(
r'Runtime\s*:\s*(.+?) \|', webpage, 'duration', default=None)
if runtime:
runtime = re.sub(r'[\s-]', '', runtime)
duration = parse_duration(runtime)
view_count = int_or_none(self._search_regex(
r'Views\s*:\s*(\d+)', webpage, 'view count', default=None))
comment_count = int_or_none(self._search_regex(
r'Comments\s*:\s*(\d+)', webpage, 'comment count', default=None))
uploader = self._search_regex(
r'Added by\s*:\s*<a[^>]+>([^<]+)</a>', webpage, 'uploader', default=None)
upload_date = unified_strdate(self._search_regex(
r'Added on\s*:\s*([\d-]+)', webpage, 'upload date', default=None))
return {
'id': flv_id,
'url': rtmp_url,
'ext': 'flv',
'no_resume': True,
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
'uploader': uploader,
'upload_date': upload_date,
}

View File

@@ -1,24 +1,41 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from ..utils import (
sanitized_Request,
smuggle_url,
xpath_text,
xpath_element,
int_or_none,
ExtractorError,
find_xpath_attr,
)
class CBSIE(InfoExtractor):
class CBSBaseIE(ThePlatformIE):
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
class CBSIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': {
'id': '4JUVEwq3wUT7',
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'display_id': 'connect-chat-feat-garth-brooks',
'ext': 'flv',
'ext': 'mp4',
'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
'duration': 1495,
'timestamp': 1385585425,
'upload_date': '20131127',
'uploader': 'CBSI-NEW',
},
'params': {
# rtmp download
@@ -47,22 +64,46 @@ class CBSIE(InfoExtractor):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?manifest=m3u&mbr=true'
def _real_extract(self, url):
display_id = self._match_id(url)
request = sanitized_Request(url)
# Android UA is served with higher quality (720p) streams (see
# https://github.com/rg3/youtube-dl/issues/7490)
request.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4; Nexus 5)')
webpage = self._download_webpage(request, display_id)
real_id = self._search_regex(
[r"video\.settings\.pid\s*=\s*'([^']+)';", r"cbsplayer\.pid\s*=\s*'([^']+)';"],
webpage, 'real video ID')
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true&manifest=m3u' % real_id,
{'force_smil_url': True}),
webpage = self._download_webpage(url, display_id)
content_id = self._search_regex(
[r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
webpage, 'content id')
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
pid = xpath_text(item, 'pid')
if not pid:
continue
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % pid, content_id, 'Downloading %s SMIL data' % pid)
except ExtractorError:
continue
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
info.update({
'id': content_id,
'display_id': display_id,
}
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info

View File

@@ -1,12 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .theplatform import ThePlatformIE
from ..utils import int_or_none
class CNETIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?cnet\.com/videos/(?P<id>[^/]+)/'
class CBSInteractiveIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?P<site>cnet|zdnet)\.com/(?:videos|video/share)/(?P<id>[^/?]+)'
_TESTS = [{
'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
'info_dict': {
@@ -17,6 +19,8 @@ class CNETIE(ThePlatformIE):
'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861',
'uploader': 'Sarah Mitroff',
'duration': 70,
'timestamp': 1396479627,
'upload_date': '20140402',
},
}, {
'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/',
@@ -28,15 +32,38 @@ class CNETIE(ThePlatformIE):
'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40',
'uploader': 'Ashley Esqueda',
'duration': 1482,
'timestamp': 1433289889,
'upload_date': '20150603',
},
}, {
'url': 'http://www.zdnet.com/video/share/video-keeping-android-smartphones-and-tablets-secure/',
'info_dict': {
'id': 'bc1af9f0-a2b5-4e54-880d-0d95525781c0',
'ext': 'mp4',
'title': 'Video: Keeping Android smartphones and tablets secure',
'description': 'Here\'s the best way to keep Android devices secure, and what you do when they\'ve come to the end of their lives.',
'uploader_id': 'f2d97ea2-8175-11e2-9d12-0018fe8a00b0',
'uploader': 'Adrian Kingsley-Hughes',
'timestamp': 1448961720,
'upload_date': '20151201',
},
'params': {
# m3u8 download
'skip_download': True,
}
}]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/kYEXFC/%s?mbr=true'
MPX_ACCOUNTS = {
'cnet': 2288573011,
'zdnet': 2387448114,
}
def _real_extract(self, url):
display_id = self._match_id(url)
site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex(
r"data-cnet-video(?:-uvp)?-options='([^']+)'",
r"data-(?:cnet|zdnet)-video(?:-uvp)?-options='([^']+)'",
webpage, 'data json')
data = self._parse_json(data_json, display_id)
vdata = data.get('video') or data['videos'][0]
@@ -51,16 +78,15 @@ class CNETIE(ThePlatformIE):
uploader = None
uploader_id = None
metadata = self.get_metadata('kYEXFC/%s' % list(vdata['files'].values())[0], video_id)
description = vdata.get('description') or metadata.get('description')
duration = int_or_none(vdata.get('duration')) or metadata.get('duration')
formats = []
subtitles = {}
media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
formats, subtitles = [], {}
if site == 'cnet':
formats, subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue
release_url = 'http://link.theplatform.com/s/kYEXFC/%s?format=SMIL&mbr=true' % vid
release_url = self.TP_RELEASE_URL_TEMPLATE % vid
if fkey == 'hds':
release_url += '&manifest=f4m'
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey)
@@ -68,15 +94,15 @@ class CNETIE(ThePlatformIE):
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id)
info.update({
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': metadata.get('thumbnail'),
'duration': duration,
'duration': int_or_none(vdata.get('duration')),
'uploader': uploader,
'uploader_id': uploader_id,
'subtitles': subtitles,
'formats': formats,
}
})
return info

View File

@@ -2,16 +2,15 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from .cbs import CBSBaseIE
from ..utils import (
parse_duration,
find_xpath_attr,
)
class CBSNewsIE(ThePlatformIE):
class CBSNewsIE(CBSBaseIE):
IE_DESC = 'CBS News'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_TESTS = [
{
@@ -49,15 +48,6 @@ class CBSNewsIE(ThePlatformIE):
},
]
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -78,7 +68,7 @@ class CBSNewsIE(ThePlatformIE):
pid = item.get('media' + format_id)
if not pid:
continue
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?format=SMIL&mbr=true' % pid
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
@@ -96,7 +86,7 @@ class CBSNewsIE(ThePlatformIE):
class CBSNewsLiveVideoIE(InfoExtractor):
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
@@ -122,6 +112,7 @@ class CBSNewsLiveVideoIE(InfoExtractor):
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
return {
'id': video_id,

View File

@@ -6,7 +6,7 @@ from .common import InfoExtractor
class CBSSportsIE(InfoExtractor):
_VALID_URL = r'http://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
_VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
_TEST = {
'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s',

96
youtube_dl/extractor/cda.py Executable file
View File

@@ -0,0 +1,96 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
decode_packed_codes,
ExtractorError,
parse_duration
)
class CDAIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?cda\.pl/video|ebd\.cda\.pl/[0-9]+x[0-9]+)/(?P<id>[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.cda.pl/video/5749950c',
'md5': '6f844bf51b15f31fae165365707ae970',
'info_dict': {
'id': '5749950c',
'ext': 'mp4',
'height': 720,
'title': 'Oto dlaczego przed zakrętem należy zwolnić.',
'duration': 39
}
}, {
'url': 'http://www.cda.pl/video/57413289',
'md5': 'a88828770a8310fc00be6c95faf7f4d5',
'info_dict': {
'id': '57413289',
'ext': 'mp4',
'title': 'Lądowanie na lotnisku na Maderze',
'duration': 137
}
}, {
'url': 'http://ebd.cda.pl/0x0/5749950c',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage('http://ebd.cda.pl/0x0/' + video_id, video_id)
if 'Ten film jest dostępny dla użytkowników premium' in webpage:
raise ExtractorError('This video is only available for premium users.', expected=True)
title = self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title')
formats = []
info_dict = {
'id': video_id,
'title': title,
'formats': formats,
'duration': None,
}
def extract_format(page, version):
unpacked = decode_packed_codes(page)
format_url = self._search_regex(
r"url:\\'(.+?)\\'", unpacked, '%s url' % version, fatal=False)
if not format_url:
return
f = {
'url': format_url,
}
m = re.search(
r'<a[^>]+data-quality="(?P<format_id>[^"]+)"[^>]+href="[^"]+"[^>]+class="[^"]*quality-btn-active[^"]*">(?P<height>[0-9]+)p',
page)
if m:
f.update({
'format_id': m.group('format_id'),
'height': int(m.group('height')),
})
info_dict['formats'].append(f)
if not info_dict['duration']:
info_dict['duration'] = parse_duration(self._search_regex(
r"duration:\\'(.+?)\\'", unpacked, 'duration', fatal=False))
extract_format(webpage, 'default')
for href, resolution in re.findall(
r'<a[^>]+data-quality="[^"]+"[^>]+href="([^"]+)"[^>]+class="quality-btn"[^>]*>([0-9]+p)',
webpage):
webpage = self._download_webpage(
href, video_id, 'Downloading %s version information' % resolution, fatal=False)
if not webpage:
# Manually report warning because empty page is returned when
# invalid version is requested.
self.report_warning('Unable to download %s version information' % resolution)
continue
extract_format(webpage, resolution)
self._sort_formats(formats)
return info_dict

View File

@@ -5,7 +5,6 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse,
)
@@ -13,6 +12,7 @@ from ..utils import (
ExtractorError,
float_or_none,
sanitized_Request,
urlencode_postdata,
)
@@ -102,7 +102,7 @@ class CeskaTelevizeIE(InfoExtractor):
req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=compat_urllib_parse.urlencode(data))
data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1')
@@ -129,7 +129,8 @@ class CeskaTelevizeIE(InfoExtractor):
formats = []
for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', entry_protocol='m3u8_native'))
stream_url, playlist_id, 'mp4',
entry_protocol='m3u8_native', fatal=False))
self._sort_formats(formats)
item_id = item.get('id') or item['assetId']

View File

@@ -48,6 +48,7 @@ class ChaturbateIE(InfoExtractor):
raise ExtractorError('Unable to find stream URL')
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -19,7 +19,7 @@ def _decode(s):
class CliphunterIE(InfoExtractor):
IE_NAME = 'cliphunter'
_VALID_URL = r'''(?x)http://(?:www\.)?cliphunter\.com/w/
_VALID_URL = r'''(?x)https?://(?:www\.)?cliphunter\.com/w/
(?P<id>[0-9]+)/
(?P<seo>.+?)(?:$|[#\?])
'''

View File

@@ -8,7 +8,7 @@ from ..utils import (
class ClipsyndicateIE(InfoExtractor):
_VALID_URL = r'http://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
_VALID_URL = r'https?://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe',

View File

@@ -6,7 +6,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_HTTPError,
)
from ..utils import (
@@ -64,7 +64,7 @@ class CloudyIE(InfoExtractor):
'errorUrl': error_url,
})
data_url = self._API_URL % (video_host, compat_urllib_parse.urlencode(form))
data_url = self._API_URL % (video_host, compat_urllib_parse_urlencode(form))
player_data = self._download_webpage(
data_url, video_id, 'Downloading player data')
data = compat_parse_qs(player_data)

View File

@@ -12,7 +12,7 @@ from ..utils import (
class ClubicIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?clubic\.com/video/(?:[^/]+/)*video.*-(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://(?:www\.)?clubic\.com/video/(?:[^/]+/)*video.*-(?P<id>[0-9]+)\.html'
_TESTS = [{
'url': 'http://www.clubic.com/video/clubic-week/video-clubic-week-2-0-le-fbi-se-lance-dans-la-photo-d-identite-448474.html',

View File

@@ -0,0 +1,36 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class CNBCIE(InfoExtractor):
_VALID_URL = r'https?://video\.cnbc\.com/gallery/\?video=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://video.cnbc.com/gallery/?video=3000503714',
'info_dict': {
'id': '3000503714',
'ext': 'mp4',
'title': 'Fighting zombies is big business',
'description': 'md5:0c100d8e1a7947bd2feec9a5550e519e',
'timestamp': 1459332000,
'upload_date': '20160330',
'uploader': 'NBCU-CNBC',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/gZWlPC/media/guid/2408950221/%s?mbr=true&manifest=m3u' % video_id,
{'force_smil_url': True}),
'id': video_id,
}

View File

@@ -11,7 +11,7 @@ from ..utils import (
class ComCarCoffIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
_VALID_URL = r'https?://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
_TESTS = [{
'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
'info_dict': {
@@ -41,7 +41,13 @@ class ComCarCoffIE(InfoExtractor):
display_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
video_id = compat_str(video_data['mediaId'])
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{
'url': video_data['images']['thumb'],
}, {
@@ -54,15 +60,14 @@ class ComCarCoffIE(InfoExtractor):
video_data.get('duration'))
return {
'_type': 'url_transparent',
'url': 'crackle:%s' % video_id,
'id': video_id,
'display_id': display_id,
'title': video_data['title'],
'title': title,
'description': video_data.get('description'),
'timestamp': timestamp,
'duration': duration,
'thumbnails': thumbnails,
'formats': formats,
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),

View File

@@ -5,7 +5,7 @@ import re
from .mtv import MTVServicesInfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse,
compat_urllib_parse_urlencode,
)
from ..utils import (
ExtractorError,
@@ -201,7 +201,7 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
# Correct cc.com in uri
uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.com', uri)
index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse.urlencode({'uri': uri}))
index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse_urlencode({'uri': uri}))
idoc = self._download_xml(
index_url, epTitle,
'Downloading show index', 'Unable to download episode index')

View File

@@ -21,9 +21,11 @@ from ..compat import (
compat_os_name,
compat_str,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
)
from ..downloader.f4m import remove_encrypted_media
from ..utils import (
NO_DEFAULT,
age_restricted,
@@ -48,6 +50,7 @@ from ..utils import (
determine_protocol,
parse_duration,
mimetype2ext,
update_Request,
update_url_query,
)
@@ -346,7 +349,7 @@ class InfoExtractor(object):
def IE_NAME(self):
return compat_str(type(self).__name__[:-2])
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers=None, query=None):
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
""" Returns the response handle """
if note is None:
self.report_download_webpage(video_id)
@@ -356,11 +359,14 @@ class InfoExtractor(object):
else:
self.to_screen('%s: %s' % (video_id, note))
# data, headers and query params will be ignored for `Request` objects
if isinstance(url_or_request, compat_str):
if isinstance(url_or_request, compat_urllib_request.Request):
url_or_request = update_Request(
url_or_request, data=data, headers=headers, query=query)
else:
if query:
url_or_request = update_url_query(url_or_request, query)
if data or headers:
url_or_request = sanitized_Request(url_or_request, data, headers or {})
url_or_request = sanitized_Request(url_or_request, data, headers)
try:
return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
@@ -376,7 +382,7 @@ class InfoExtractor(object):
self._downloader.report_warning(errmsg)
return False
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers=None, query=None):
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}):
""" Returns a tuple (page content as string, URL handle) """
# Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)):
@@ -469,7 +475,7 @@ class InfoExtractor(object):
return content
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers=None, query=None):
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers={}, query={}):
""" Returns the data of the page as a string """
success = False
try_count = 0
@@ -490,7 +496,7 @@ class InfoExtractor(object):
def _download_xml(self, url_or_request, video_id,
note='Downloading XML', errnote='Unable to download XML',
transform_source=None, fatal=True, encoding=None, data=None, headers=None, query=None):
transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
"""Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query)
@@ -504,7 +510,7 @@ class InfoExtractor(object):
note='Downloading JSON metadata',
errnote='Unable to download JSON metadata',
transform_source=None,
fatal=True, encoding=None, data=None, headers=None, query=None):
fatal=True, encoding=None, data=None, headers={}, query={}):
json_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query)
@@ -862,6 +868,7 @@ class InfoExtractor(object):
proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1
if f.get('vcodec') == 'none': # audio only
preference -= 50
if self._downloader.params.get('prefer_free_formats'):
ORDER = ['aac', 'mp3', 'm4a', 'webm', 'ogg', 'opus']
else:
@@ -872,6 +879,8 @@ class InfoExtractor(object):
except ValueError:
audio_ext_preference = -1
else:
if f.get('acodec') == 'none': # video only
preference -= 40
if self._downloader.params.get('prefer_free_formats'):
ORDER = ['flv', 'mp4', 'webm']
else:
@@ -986,6 +995,11 @@ class InfoExtractor(object):
if not media_nodes:
manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
# Remove unsupported DRM protected media from final formats
# rendition (see https://github.com/rg3/youtube-dl/issues/8573).
media_nodes = remove_encrypted_media(media_nodes)
if not media_nodes:
return formats
base_url = xpath_text(
manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'],
'base URL', default=None)
@@ -1018,8 +1032,6 @@ class InfoExtractor(object):
'height': int_or_none(media_el.attrib.get('height')),
'preference': preference,
})
self._sort_formats(formats)
return formats
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
@@ -1140,7 +1152,6 @@ class InfoExtractor(object):
last_media = None
formats.append(f)
last_info = {}
self._sort_formats(formats)
return formats
@staticmethod
@@ -1297,7 +1308,7 @@ class InfoExtractor(object):
'plugin': 'flowplayer-3.2.0.1',
}
f4m_url += '&' if '?' in f4m_url else '?'
f4m_url += compat_urllib_parse.urlencode(f4m_params)
f4m_url += compat_urllib_parse_urlencode(f4m_params)
formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False))
continue
@@ -1314,8 +1325,6 @@ class InfoExtractor(object):
})
continue
self._sort_formats(formats)
return formats
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
@@ -1326,7 +1335,7 @@ class InfoExtractor(object):
if not src or src in urls:
continue
urls.append(src)
ext = textstream.get('ext') or determine_ext(src) or mimetype2ext(textstream.get('type'))
ext = textstream.get('ext') or mimetype2ext(textstream.get('type')) or determine_ext(src)
lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName') or textstream.get('lang') or subtitles_lang
subtitles.setdefault(lang, []).append({
'url': src,
@@ -1506,9 +1515,16 @@ class InfoExtractor(object):
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)(?:%(0\d+)d)?\$', r'%(\1)\2d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)\$', r'%(\1)d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)%(\d+)\$', r'%(\1)\2d', media_template)
media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [media_template % {'Number': segment_number, 'Bandwidth': representation_attrib.get('bandwidth')} for segment_number in range(representation_ms_info['start_number'], representation_ms_info['total_number'] + representation_ms_info['start_number'])]
representation_ms_info['segment_urls'] = [
media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth')}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
if 'segment_urls' in representation_ms_info:
f.update({
'segment_urls': representation_ms_info['segment_urls'],
@@ -1533,7 +1549,6 @@ class InfoExtractor(object):
existing_format.update(f)
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
self._sort_formats(formats)
return formats
def _live_title(self, name):

View File

@@ -0,0 +1,36 @@
from __future__ import unicode_literals
import os
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import url_basename
class RtmpIE(InfoExtractor):
IE_DESC = False # Do not list
_VALID_URL = r'(?i)rtmp[est]?://.+'
_TESTS = [{
'url': 'rtmp://cp44293.edgefcs.net/ondemand?auth=daEcTdydfdqcsb8cZcDbAaCbhamacbbawaS-bw7dBb-bWG-GqpGFqCpNCnGoyL&aifp=v001&slist=public/unsecure/audio/2c97899446428e4301471a8cb72b4b97--audio--pmg-20110908-0900a_flv_aac_med_int.mp4',
'only_matching': True,
}, {
'url': 'rtmp://edge.live.hitbox.tv/live/dimak',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
title = compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
return {
'id': video_id,
'title': title,
'formats': [{
'url': url,
'ext': 'flv',
'format_id': compat_urlparse.urlparse(url).scheme,
}],
}

View File

@@ -5,7 +5,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urlparse,
)
@@ -45,7 +45,7 @@ class CondeNastIE(InfoExtractor):
'wmagazine': 'W Magazine',
}
_VALID_URL = r'http://(?:video|www|player)\.(?P<site>%s)\.com/(?P<type>watch|series|video|embed(?:js)?)/(?P<id>[^/?#]+)' % '|'.join(_SITES.keys())
_VALID_URL = r'https?://(?:video|www|player)\.(?P<site>%s)\.com/(?P<type>watch|series|video|embed(?:js)?)/(?P<id>[^/?#]+)' % '|'.join(_SITES.keys())
IE_DESC = 'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values()))
EMBED_URL = r'(?:https?:)?//player\.(?P<site>%s)\.com/(?P<type>embed(?:js)?)/.+?' % '|'.join(_SITES.keys())
@@ -97,7 +97,7 @@ class CondeNastIE(InfoExtractor):
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id')
player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id')
target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target')
data = compat_urllib_parse.urlencode({'videoId': video_id,
data = compat_urllib_parse_urlencode({'videoId': video_id,
'playerId': player_id,
'target': target,
})

View File

@@ -11,8 +11,8 @@ from math import pow, sqrt, floor
from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
)
@@ -54,7 +54,7 @@ class CrunchyrollBaseIE(InfoExtractor):
def _real_initialize(self):
self._login()
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None):
def _download_webpage(self, url_or_request, *args, **kwargs):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
@@ -65,8 +65,7 @@ class CrunchyrollBaseIE(InfoExtractor):
# Crunchyroll to not work in georestriction cases in some browsers that don't place
# the locale lang first in header. However allowing any language seems to workaround the issue.
request.add_header('Accept-Language', '*')
return super(CrunchyrollBaseIE, self)._download_webpage(
request, video_id, note, errnote, fatal, tries, timeout, encoding)
return super(CrunchyrollBaseIE, self)._download_webpage(request, *args, **kwargs)
@staticmethod
def _add_skip_wall(url):
@@ -79,7 +78,7 @@ class CrunchyrollBaseIE(InfoExtractor):
# See https://github.com/rg3/youtube-dl/issues/7202.
qs['skip_wall'] = ['1']
return compat_urlparse.urlunparse(
parsed_url._replace(query=compat_urllib_parse.urlencode(qs, True)))
parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
class CrunchyrollIE(CrunchyrollBaseIE):
@@ -309,7 +308,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_req = sanitized_Request(playerdata_url)
playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url})
playerdata_req.data = urlencode_postdata({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
@@ -323,7 +322,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
streamdata_req = sanitized_Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality),
compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8'))
compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8'))
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata = self._download_xml(
streamdata_req, video_id,

View File

@@ -15,7 +15,7 @@ from .senateisvp import SenateISVPIE
class CSpanIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
_VALID_URL = r'https?://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
IE_DESC = 'C-SPAN'
_TESTS = [{
'url': 'http://www.c-span.org/video/?313572-1/HolderonV',

View File

@@ -8,7 +8,7 @@ from ..utils import parse_iso8601, ExtractorError
class CtsNewsIE(InfoExtractor):
IE_DESC = '華視新聞'
# https connection failed (Connection reset)
_VALID_URL = r'http://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
_VALID_URL = r'https?://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
_TESTS = [{
'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html',
'md5': 'a9875cb790252b08431186d741beaabe',

View File

@@ -57,6 +57,7 @@ class CWTVIE(InfoExtractor):
formats = self._extract_m3u8_formats(
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{
'url': image['uri'],

View File

@@ -8,8 +8,8 @@ import itertools
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urlparse,
)
from ..utils import (
@@ -70,7 +70,7 @@ class DaumIE(InfoExtractor):
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url))
query = compat_urllib_parse.urlencode({'vid': video_id})
query = compat_urllib_parse_urlencode({'vid': video_id})
movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query,
video_id, 'Downloading video formats info')
@@ -86,7 +86,7 @@ class DaumIE(InfoExtractor):
formats = []
for format_el in movie_data['output_list']['output_list']:
profile = format_el['profile']
format_query = compat_urllib_parse.urlencode({
format_query = compat_urllib_parse_urlencode({
'vid': video_id,
'profile': profile,
})

View File

@@ -6,7 +6,7 @@ import base64
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_str,
)
from ..utils import (
@@ -15,6 +15,7 @@ from ..utils import (
sanitized_Request,
smuggle_url,
unsmuggle_url,
urlencode_postdata,
)
@@ -106,7 +107,7 @@ class DCNVideoIE(DCNBaseIE):
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
compat_urllib_parse.urlencode({
compat_urllib_parse_urlencode({
'id': video_data['id'],
'user_id': video_data['user_id'],
'signature': video_data['signature'],
@@ -133,7 +134,7 @@ class DCNLiveIE(DCNBaseIE):
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
compat_urllib_parse.urlencode({
compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'],
@@ -174,7 +175,7 @@ class DCNSeasonIE(InfoExtractor):
data['show_id'] = show_id
request = sanitized_Request(
'http://admin.mangomolo.com/analytics/index.php/plus/show',
compat_urllib_parse.urlencode(data),
urlencode_postdata(data),
{
'Origin': 'http://www.dcndigital.ae',
'Content-Type': 'application/x-www-form-urlencoded'

View File

@@ -6,7 +6,7 @@ from ..compat import compat_str
class DctpTvIE(InfoExtractor):
_VALID_URL = r'http://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
_VALID_URL = r'https?://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
_TEST = {
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'info_dict': {

View File

@@ -41,7 +41,9 @@ class DeezerPlaylistIE(InfoExtractor):
'Deezer said: %s' % geoblocking_msg, expected=True)
data_json = self._search_regex(
r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n', webpage, 'data JSON')
(r'__DZR_APP_STATE__\s*=\s*({.+?})\s*</script>',
r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n'),
webpage, 'data JSON')
data = json.loads(data_json)
playlist_title = data.get('DATA', {}).get('TITLE')

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class DefenseGouvFrIE(InfoExtractor):
IE_NAME = 'defense.gouv.fr'
_VALID_URL = r'http://.*?\.defense\.gouv\.fr/layout/set/ligthboxvideo/base-de-medias/webtv/(?P<id>[^/?#]*)'
_VALID_URL = r'https?://.*?\.defense\.gouv\.fr/layout/set/ligthboxvideo/base-de-medias/webtv/(?P<id>[^/?#]*)'
_TEST = {
'url': 'http://www.defense.gouv.fr/layout/set/ligthboxvideo/base-de-medias/webtv/attaque-chimique-syrienne-du-21-aout-2013-1',

View File

@@ -38,6 +38,7 @@ class DFBIE(InfoExtractor):
token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
formats = self._extract_f4m_formats(manifest_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -9,7 +9,7 @@ from ..compat import compat_str
class DiscoveryIE(InfoExtractor):
_VALID_URL = r'''(?x)http://(?:www\.)?(?:
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:
discovery|
investigationdiscovery|
discoverylife|
@@ -63,18 +63,23 @@ class DiscoveryIE(InfoExtractor):
video_title = info.get('playlist_title') or info.get('video_title')
entries = [{
'id': compat_str(video_info['id']),
'formats': self._extract_m3u8_formats(
entries = []
for idx, video_info in enumerate(info['playlist']):
formats = self._extract_m3u8_formats(
video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls',
note='Download m3u8 information for video %d' % (idx + 1)),
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
'webpage_url': video_info.get('href') or video_info.get('url'),
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
} for idx, video_info in enumerate(info['playlist'])]
note='Download m3u8 information for video %d' % (idx + 1))
self._sort_formats(formats)
entries.append({
'id': compat_str(video_info['id']),
'formats': formats,
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
'webpage_url': video_info.get('href') or video_info.get('url'),
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
})
return self.playlist_result(entries, display_id, video_title)

View File

@@ -10,7 +10,7 @@ from ..compat import (compat_str, compat_basestring)
class DouyuTVIE(InfoExtractor):
IE_DESC = '斗鱼'
_VALID_URL = r'http://(?:www\.)?douyutv\.com/(?P<id>[A-Za-z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?P<id>[A-Za-z0-9]+)'
_TESTS = [{
'url': 'http://www.douyutv.com/iseven',
'info_dict': {
@@ -60,6 +60,9 @@ class DouyuTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.douyu.com/xiaocang',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -10,7 +10,7 @@ from ..utils import int_or_none
class DPlayIE(InfoExtractor):
_VALID_URL = r'http://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
@@ -118,6 +118,8 @@ class DPlayIE(InfoExtractor):
if info.get(protocol):
extract_formats(protocol, info[protocol])
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,

View File

@@ -6,7 +6,6 @@ import itertools
from .amp import AMPIE
from ..compat import (
compat_HTTPError,
compat_urllib_parse,
compat_urlparse,
)
from ..utils import (
@@ -14,6 +13,7 @@ from ..utils import (
clean_html,
int_or_none,
sanitized_Request,
urlencode_postdata
)
@@ -50,7 +50,7 @@ class DramaFeverBaseIE(AMPIE):
}
request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8'))
self._LOGIN_URL, urlencode_postdata(login_form))
response = self._download_webpage(
request, None, 'Logging in as %s' % username)

View File

@@ -7,7 +7,7 @@ from .zdf import ZDFIE
class DreiSatIE(ZDFIE):
IE_NAME = '3sat'
_VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TESTS = [
{
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',

View File

@@ -15,7 +15,7 @@ class DVTVIE(InfoExtractor):
IE_NAME = 'dvtv'
IE_DESC = 'http://video.aktualne.cz/'
_VALID_URL = r'http://video\.aktualne\.cz/(?:[^/]+/)+r~(?P<id>[0-9a-f]{32})'
_VALID_URL = r'https?://video\.aktualne\.cz/(?:[^/]+/)+r~(?P<id>[0-9a-f]{32})'
_TESTS = [{
'url': 'http://video.aktualne.cz/dvtv/vondra-o-ceskem-stoleti-pri-pohledu-na-havla-mi-bylo-trapne/r~e5efe9ca855511e4833a0025900fea04/',

View File

@@ -39,13 +39,13 @@ class DWIE(InfoExtractor):
hidden_inputs = self._hidden_inputs(webpage)
title = hidden_inputs['media_title']
formats = []
if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
formats = self._extract_smil_formats(
'http://www.dw.com/smil/v-%s' % media_id, media_id,
transform_source=lambda s: s.replace(
'rtmp://tv-od.dw.de/flash/',
'http://tv-download.dw.de/dwtv_video/flv/'))
self._sort_formats(formats)
else:
formats = [{'url': hidden_inputs['file_name']}]

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class EchoMskIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?echo\.msk\.ru/sounds/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?echo\.msk\.ru/sounds/(?P<id>\d+)'
_TEST = {
'url': 'http://www.echo.msk.ru/sounds/1464134.html',
'md5': '2e44b3b78daff5b458e4dbc37f191f7c',

View File

@@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
ExtractorError,
unescapeHTML
@@ -43,7 +43,7 @@ class EroProfileIE(InfoExtractor):
if username is None:
return
query = compat_urllib_parse.urlencode({
query = compat_urllib_parse_urlencode({
'username': username,
'password': password,
'url': 'http://www.eroprofile.com/',

View File

@@ -8,7 +8,7 @@ from .common import InfoExtractor
class ExfmIE(InfoExtractor):
IE_NAME = 'exfm'
IE_DESC = 'ex.fm'
_VALID_URL = r'http://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
_SOUNDCLOUD_URL = r'http://(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
_TESTS = [
{

View File

@@ -5,19 +5,18 @@ import hashlib
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
compat_urlparse,
)
from ..utils import (
encode_dict,
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
class FC2IE(InfoExtractor):
_VALID_URL = r'^http://video\.fc2\.com/(?:[^/]+/)*content/(?P<id>[^/]+)'
_VALID_URL = r'^https?://video\.fc2\.com/(?:[^/]+/)*content/(?P<id>[^/]+)'
IE_NAME = 'fc2'
_NETRC_MACHINE = 'fc2'
_TESTS = [{
@@ -57,7 +56,7 @@ class FC2IE(InfoExtractor):
'Submit': ' Login ',
}
login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8')
login_data = urlencode_postdata(login_form_strs)
request = sanitized_Request(
'https://secure.id.fc2.com/index.php?mode=login&switch_language=en', login_data)

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class FirstpostIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
_TEST = {
'url': 'http://www.firstpost.com/india/india-to-launch-indigenous-aircraft-carrier-monday-1025403.html',

View File

@@ -8,7 +8,7 @@ from ..utils import int_or_none
class FirstTVIE(InfoExtractor):
IE_NAME = '1tv'
IE_DESC = 'Первый канал'
_VALID_URL = r'http://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
_TESTS = [{
'url': 'http://www.1tv.ru/videoarchive/73390',

View File

@@ -4,8 +4,8 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_parse_qs,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urlparse,
)
@@ -109,7 +109,7 @@ class FiveMinIE(InfoExtractor):
response = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?' +
compat_urllib_parse.urlencode({
compat_urllib_parse_urlencode({
'func': 'GetResults',
'playlist': video_id,
'sid': sid,

View File

@@ -10,7 +10,7 @@ from ..utils import (
class FKTVIE(InfoExtractor):
IE_NAME = 'fernsehkritik.tv'
_VALID_URL = r'http://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?'
_VALID_URL = r'https?://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?'
_TEST = {
'url': 'http://fernsehkritik.tv/folge-1',

View File

@@ -1,7 +1,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse
from ..compat import compat_urllib_parse_urlencode
from ..utils import (
ExtractorError,
int_or_none,
@@ -42,7 +42,7 @@ class FlickrIE(InfoExtractor):
}
if secret:
query['secret'] = secret
data = self._download_json(self._API_BASE_URL + compat_urllib_parse.urlencode(query), video_id, note)
data = self._download_json(self._API_BASE_URL + compat_urllib_parse_urlencode(query), video_id, note)
if data['stat'] != 'ok':
raise ExtractorError(data['message'])
return data

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class FootyRoomIE(InfoExtractor):
_VALID_URL = r'http://footyroom\.com/(?P<id>[^/]+)'
_VALID_URL = r'https?://footyroom\.com/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://footyroom.com/schalke-04-0-2-real-madrid-2015-02/',
'info_dict': {

View File

@@ -16,6 +16,9 @@ class FOXIE(InfoExtractor):
'title': 'Official Trailer: Gotham',
'description': 'Tracing the rise of the great DC Comics Super-Villains and vigilantes, Gotham reveals an entirely new chapter that has never been told.',
'duration': 129,
'timestamp': 1400020798,
'upload_date': '20140513',
'uploader': 'NEWA-FNG-FOXCOM',
},
'add_ie': ['ThePlatform'],
}

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class FoxgayIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
_VALID_URL = r'https?://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
_TEST = {
'url': 'http://foxgay.com/videos/fuck-turkish-style-2582.shtml',
'md5': '80d72beab5d04e1655a56ad37afe6841',

View File

@@ -18,8 +18,8 @@ class FoxNewsIE(AMPIE):
'title': 'Frozen in Time',
'description': '16-year-old girl is size of toddler',
'duration': 265,
# 'timestamp': 1304411491,
# 'upload_date': '20110503',
'timestamp': 1304411491,
'upload_date': '20110503',
'thumbnail': 're:^https?://.*\.jpg$',
},
},
@@ -32,8 +32,8 @@ class FoxNewsIE(AMPIE):
'title': "Rep. Luis Gutierrez on if Obama's immigration plan is legal",
'description': "Congressman discusses president's plan",
'duration': 292,
# 'timestamp': 1417662047,
# 'upload_date': '20141204',
'timestamp': 1417662047,
'upload_date': '20141204',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {

View File

@@ -6,7 +6,7 @@ from ..utils import int_or_none
class FranceInterIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.franceinter.fr/player/reecouter?play=793962',
'md5': '4764932e466e6f6c79c317d2e74f6884',

View File

@@ -60,28 +60,31 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
video_id, 'Downloading f4m manifest token', fatal=False)
if f4m_url:
formats.extend(self._extract_f4m_formats(
f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, 1, format_id))
f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44',
video_id, f4m_id=format_id, fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4', m3u8_id=format_id))
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif video_url.startswith('rtmp'):
formats.append({
'url': video_url,
'format_id': 'rtmp-%s' % format_id,
'ext': 'flv',
'preference': 1,
})
else:
formats.append({
'url': video_url,
'format_id': format_id,
'preference': -1,
})
if self._is_valid_url(video_url, video_id, format_id):
formats.append({
'url': video_url,
'format_id': format_id,
})
self._sort_formats(formats)
title = info['titre']
subtitle = info.get('sous_titre')
if subtitle:
title += ' - %s' % subtitle
title = title.strip()
subtitles = {}
subtitles_list = [{
@@ -125,13 +128,13 @@ class PluzzIE(FranceTVBaseInfoExtractor):
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://(?:www|mobile)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
_TESTS = [{
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
'info_dict': {
'id': '84981923',
'ext': 'flv',
'ext': 'mp4',
'title': 'Soir 3',
'upload_date': '20130826',
'timestamp': 1377548400,
@@ -139,6 +142,10 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'fr': 'mincount:2',
},
},
'params': {
# m3u8 downloads
'skip_download': True,
},
}, {
'url': 'http://www.francetvinfo.fr/elections/europeennes/direct-europeennes-regardez-le-debat-entre-les-candidats-a-la-presidence-de-la-commission_600639.html',
'info_dict': {
@@ -155,11 +162,32 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'url': 'http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html',
'md5': 'f485bda6e185e7d15dbc69b72bae993e',
'info_dict': {
'id': '556e03339473995ee145930c',
'id': 'NI_173343',
'ext': 'mp4',
'title': 'Les entreprises familiales : le secret de la réussite',
'thumbnail': 're:^https?://.*\.jpe?g$',
}
'timestamp': 1433273139,
'upload_date': '20150602',
},
'params': {
# m3u8 downloads
'skip_download': True,
},
}, {
'url': 'http://france3-regions.francetvinfo.fr/bretagne/cotes-d-armor/thalassa-echappee-breizh-ce-venredi-dans-les-cotes-d-armor-954961.html',
'md5': 'f485bda6e185e7d15dbc69b72bae993e',
'info_dict': {
'id': 'NI_657393',
'ext': 'mp4',
'title': 'Olivier Monthus, réalisateur de "Bretagne, le choix de lArmor"',
'description': 'md5:a3264114c9d29aeca11ced113c37b16c',
'thumbnail': 're:^https?://.*\.jpe?g$',
'timestamp': 1458300695,
'upload_date': '20160318',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
@@ -172,7 +200,9 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
return self.url_result(dmcloud_url, 'DailymotionCloud')
video_id, catalogue = self._search_regex(
r'id-video=([^@]+@[^"]+)', webpage, 'video id').split('@')
(r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'),
webpage, 'video id').split('@')
return self._extract_video(video_id, catalogue)

Some files were not shown because too many files have changed in this diff Show More