Compare commits

...

92 Commits

Author SHA1 Message Date
Sergey M․
3acff9423d release 2016.09.18 2016-09-18 17:16:55 +07:00
Sergey M․
9ca93b99d1 [ChangeLog] Actualize 2016-09-18 17:15:22 +07:00
Sergey M․
14ae11efab [vyborymos] Add extractor (Closes #10692) 2016-09-18 16:56:40 +07:00
Sergey M․
190d2027d0 [xfileshare] Add title regex for streamin.to and fallback to video id (Closes #10646) 2016-09-18 07:22:06 +07:00
Sergey M․
26394d021d [globo:article] Add support for multiple videos (Closes #10653) 2016-09-17 23:34:10 +07:00
Sergey M․
30d0b549be [extractor/common] Add manifest_url for hls and hds formats 2016-09-17 21:33:38 +07:00
Sergey M․
86f4d14f81 Refactor fragments interface and dash segments downloader
- Eliminate segment_urls and initialization_url
+ Introduce manifest_url (manifest may contain unfragmented data in this case url will be used for direct media URL and manifest_url for manifest itself correspondingly)
* Rewrite dashsegments downloader to use fragments data
* Improve generic mpd extraction
2016-09-17 20:35:22 +07:00
Sergey M․
21d21b0c72 [svt] Fix DASH formats extraction 2016-09-17 19:25:31 +07:00
Sergey M․
b4c1d6e800 [extractor/common] Expose fragments interface for dashsegments formats 2016-09-17 18:31:18 +07:00
Sergey M․
a0d5077c8d [extractor/common] Introduce fragments interface 2016-09-17 18:31:09 +07:00
Yen Chi Hsuan
584d6f3457 [thisav] Recognize jwplayers (closes #10447) 2016-09-17 18:46:43 +08:00
Yen Chi Hsuan
e14c82bd6b [jwplatform] Use js_to_json to detect more JWPlayers 2016-09-17 18:45:08 +08:00
Sergey M․
c51a7f0b2f [franceinter] Fix upload date extraction 2016-09-17 15:44:37 +07:00
Remita Amine
d05ef09d9d [mangomolo] fix domain regex 2016-09-17 08:11:01 +01:00
Remita Amine
30d9e20938 [postprocessor/ffmpeg] apply FFmpegFixupM3u8PP only for videos with aac codec(#5591) 2016-09-16 22:06:55 +01:00
Remita Amine
fc86d4eed0 [mangomolo] fix typo 2016-09-16 20:10:47 +01:00
Remita Amine
7d273a387a [mangomolo] add support for Mangomolo embeds 2016-09-16 19:31:39 +01:00
Remita Amine
6ad0219556 [common] add helper method for Wowza Streaming Engine format extraction 2016-09-16 19:30:38 +01:00
Remita Amine
98b7506e96 [toutv] add support for authentication(closes #10669) 2016-09-16 17:40:15 +01:00
Sergey M․
52dc8a9b3f [franceinter] Fix upload date extraction 2016-09-16 22:02:59 +07:00
Sergey M․
9d8985a165 [tv4] Fix hls and hds formats (Closes #10659) 2016-09-16 00:54:34 +07:00
Sergey M․
f5e008d134 release 2016.09.15 2016-09-15 23:46:11 +07:00
Sergey M․
e6bf3621e7 [ChangeLog] Actualize 2016-09-15 23:31:16 +07:00
stepshal
490b755769 Improve some id regexes 2016-09-15 23:12:58 +07:00
Sergey M․
1dec2c8a0e [adobepass] Change mvpd cache section name
In order to better emphasize it's relation to Adobe Pass
2016-09-15 22:47:45 +07:00
Sergey M․
dcce092e0a [extractor/common] Simplify _get_netrc_login_info and carry long lines 2016-09-15 22:35:12 +07:00
Sergey M․
32443dd346 [extractor/common] Update _get_login_info's comment 2016-09-15 22:34:29 +07:00
Sergey M․
2133565cec [extractor/common] Simplify _get_login_info 2016-09-15 22:26:37 +07:00
Sergey M․
1da50aa34e [YoutubeDL] Improve Adobe Pass options' wording 2016-09-15 22:24:55 +07:00
Sergey M․
d2522b86ac [options] Actually print Adobe Pass options sections in --help 2016-09-15 22:18:31 +07:00
Sergey M․
537f753399 [options] Improve Adobe Pass wording 2016-09-15 22:17:17 +07:00
Sergey M․
c849836854 [utils] Improve _hidden_inputs 2016-09-15 21:54:48 +07:00
Sergey M․
eb5b1fc021 [crunchyroll] Fix authentication (Closes #10655) 2016-09-15 21:53:35 +07:00
Sergey M․
95be29e1c6 [twitch] Fix api calls (Closes #10654, closes #10660) 2016-09-15 20:58:02 +07:00
Remita Amine
c035dba19e [bellmedia] add support for more sites 2016-09-15 08:12:12 +01:00
Remita Amine
87148bb711 [adobepass] rename --ap-mso-list option to --ap-list-mso 2016-09-14 20:21:09 +01:00
Remita Amine
797c636bcb [ap] improve adobe pass names and parse error handling 2016-09-14 18:58:47 +01:00
Sergey M․
0002962f3f [franceinter] Improve extraction (Closes #10538) 2016-09-14 23:59:38 +07:00
Sergey M․
3e4185c396 [utils] Use native french month names 2016-09-14 23:59:38 +07:00
Sergey M․
f6717dec8a [utils] Improve month_by_name and add tests 2016-09-14 23:59:38 +07:00
renalid
a942d6cb48 [utils,franceinter] Add french months' names and fix extraction
Update of the "FranceInter" radio extractor : webpages HTML structure
had changed, the extractor didn't work. So I updated this extractor to
get the mp3 URL and all details.
2016-09-14 23:59:38 +07:00
Yen Chi Hsuan
961516bfd1 [kwuo:song] Improve error detection (closes #10650) 2016-09-15 00:56:15 +08:00
Yen Chi Hsuan
6db354a9f4 [kuwo] Update _TESTS 2016-09-15 00:53:04 +08:00
Remita Amine
353f340e11 [go] fix typo 2016-09-14 17:22:42 +01:00
Remita Amine
014b7e6b25 [go] add support for free full episodes(#10439) 2016-09-14 17:08:25 +01:00
stepshal
925194022c Improve some _VALID_URLs 2016-09-14 22:47:21 +07:00
Sergey M․
b690ea15eb [viafree] Fix test 2016-09-14 22:45:23 +07:00
Remita Amine
5712c0f426 [adobepass] remove unnecessary option 2016-09-14 16:37:21 +01:00
Yen Chi Hsuan
86d68f906e [bilibili] Fix extraction for videos without backup_url (#10647) 2016-09-14 22:11:49 +08:00
Yen Chi Hsuan
4875ff6847 [bilibili] Remove copyrighted test cases
I can't find any English or Chinese material that claims BiliBili has
bought legal redistribution permissions for copyrighted products from
copyrighted holders.

References for removed test cases:
"刀语": https://en.wikipedia.org/wiki/Katanagatari, by White Fox
"哆啦A梦": https://en.wikipedia.org/wiki/Doraemon, by Shin-Ei Animation
"岳父岳母真难当": https://en.wikipedia.org/wiki/Serial_(Bad)_Weddings, by Les films du 24
"混沌武士": https://en.wikipedia.org/wiki/Samurai_Champloo, by Manglobe

I shouldn't have added them to _TESTS
2016-09-14 22:09:43 +08:00
Remita Amine
1b6712ab23 [adobepass] add specific options for adobe pass authentication
- add --ap-username and --ap-password option to specify
TV provider username and password in the cmd line
- add --ap-retries option to limit the number of retries
- add --list-ap-msi-ids to list the supported TV Providers
2016-09-13 22:16:01 +01:00
Sergey M․
8414c2da31 [adobepass] PEP 8 2016-09-13 23:22:16 +07:00
Sergey M․
45396dd2ed [nhk] Fix extraction (Closes #10633) 2016-09-13 23:20:25 +07:00
Remita Amine
7a7309219c [adobepass] add an option to specify mso_id and support for ROGERS TV Provider(closes #10606) 2016-09-12 23:39:35 +01:00
Sergey M․
fcba157e80 [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 23:29:43 +07:00
Sergey M․
a6ccc3e518 [safari] Improve ids regexes (#10617) 2016-09-12 23:05:52 +07:00
Sergey M․
1d16035bb4 [kaltura] Improve audio detection 2016-09-12 22:43:45 +07:00
Sergey M․
e8bcd982cc [kaltura] Skip chun format 2016-09-12 22:33:00 +07:00
Sergey M․
a5ff05df1a [extractor/generic] Add vimeo embed that requires Referer passed 2016-09-12 21:49:31 +07:00
Sergey M․
d002e91986 [vimeo:ondemand] Pass Referer along with embed URL (#10624) 2016-09-12 21:48:45 +07:00
Sergey M․
546edb2efa [ISSUE_TEMPLATE_tmpl.md] Fix typo 2016-09-12 21:01:31 +07:00
Yen Chi Hsuan
be45730226 [nbc] Add new extractor for NBC Olympics (#10295, #10361) 2016-09-12 02:55:15 +08:00
Sergey M․
ee7e672eb0 [tube8] Remove proxy settings from test 2016-09-11 23:46:50 +07:00
Sergey M․
0307d6fba6 release 2016.09.11.1 2016-09-11 23:33:20 +07:00
Sergey M․
fc150cba1d [devscripts/release.sh] Add missing fi 2016-09-11 23:32:01 +07:00
Sergey M․
d667ab7fad [ChangeLog] Actualize 2016-09-11 23:30:18 +07:00
Sergey M․
eb87d4545a [devscripts/release.sh] Add ChangeLog reminder prompt 2016-09-11 23:29:25 +07:00
Sergey M․
1c81476cbb release 2016.09.11 2016-09-11 23:20:09 +07:00
Sergey M․
bc9186c882 [tvplay] Remove unused import 2016-09-11 22:51:12 +07:00
Sergey M․
6599c72527 [tube8] Extract categories and tags (Closes #10579) 2016-09-11 22:50:36 +07:00
Yen Chi Hsuan
6bb05b32a9 [pornhub] Extract categories and tags (closes #10499) 2016-09-11 19:22:51 +08:00
Yen Chi Hsuan
fea74acad8 [foxnews] Revert to old extractor names 2016-09-11 18:54:24 +08:00
Yen Chi Hsuan
f01115c933 [openload] Temporary fix (#10408) 2016-09-11 18:36:59 +08:00
Yen Chi Hsuan
2cdbc06a1f [foxnews] Support Fox News Articles (closes #10598) 2016-09-11 18:32:45 +08:00
Sergey M․
2cb93afcd8 [viafree] Improve video id extraction (Closes #10615) 2016-09-11 14:59:14 +07:00
Yen Chi Hsuan
bfcda07a27 [abc:iview] Skip the test. They are removed soon 2016-09-11 04:06:00 +08:00
Yen Chi Hsuan
001a5fd3d7 [iwara] Fix extraction after relaunch
Closes #10462, closes #3215
2016-09-11 03:02:00 +08:00
Remita Amine
1e35999c1e [tfo] Add new extractor 2016-09-10 19:43:31 +01:00
Sergey M․
2512b17493 [lrt] Fix audio extraction (Closes #10566) 2016-09-11 01:27:20 +07:00
Sergey M․
56c0ead4d3 [9now] Improve video data extraction (Closes #10561) 2016-09-11 00:42:13 +07:00
Scott Leggett
7324243750 [9now] Fix extraction 2016-09-11 00:16:29 +07:00
Sergey M․
84a18e9b90 [polskieradio:category] Improve extraction 2016-09-10 22:01:49 +07:00
Sergey M․
b29f842e0e [canalplus] Add support for c8.fr (Closes #10577) 2016-09-10 20:46:45 +07:00
Sergey M․
f009fcac0d Merge branch 'master' of github.com:rg3/youtube-dl 2016-09-10 19:21:03 +07:00
Yen Chi Hsuan
6c3affcb18 [newgrounds] Fix uploader extraction
Closes #10584

Also change test URLs to HTTPS, as proposed by
@stepshal in #10593.

Closes #10593
2016-09-10 20:09:09 +08:00
Sergey M․
1e19ff2984 Merge branch 'polskie-radio-programme' of https://github.com/JakubAdamWieczorek/youtube-dl 2016-09-10 00:42:36 +07:00
Sergey M․
c6129feb7f [ketnet] Add extractor (Closes #10343) 2016-09-09 23:20:45 +07:00
Sergey M․
bb5ebd4453 [canvas] Add support for een.be (Closes #10605) 2016-09-09 22:16:21 +07:00
Remita Amine
cb9cbd84ed [extractors] add import for TeleQuebecIE 2016-09-08 22:55:27 +01:00
Remita Amine
4d5726b0d7 [telequebec] Add new extractor(closes #1999) 2016-09-08 22:53:44 +01:00
Remita Amine
4614ad7b59 [parliamentliveuk] fix extraction(closes #9137) 2016-09-08 20:46:12 +01:00
Jakub Adam Wieczorek
8d3737cda7 [polskieradio] Add support for downloading whole programmes.
This extends the Polskie Radio (the Polish national radio) extractor to
enable the user to download all the broadcasts of a single programme.
2016-09-06 21:34:44 +02:00
125 changed files with 1714 additions and 720 deletions

View File

@@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.08*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.18*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.08** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.18**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.09.08 [debug] youtube-dl version 2016.09.18
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}
@@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information ### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible. Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them. If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

View File

@@ -55,4 +55,4 @@ $ youtube-dl -v <your command line>
### Description of your *issue*, suggested solution and other information ### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible. Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them. If work on your *issue* requires account credentials please provide them or explain how one can obtain them.

View File

@@ -1,3 +1,72 @@
version 2016.09.18
Core
+ Introduce manifest_url and fragments fields in formats dictionary for
fragmented media
+ Provide manifest_url field for DASH segments, HLS and HDS
+ Provide fragments field for DASH segments
* Rework DASH segments downloader to use fragments field
+ Add helper method for Wowza Streaming Engine formats extraction
Extractors
+ [vyborymos] Add extractor for vybory.mos.ru (#10692)
+ [xfileshare] Add title regular expression for streamin.to (#10646)
+ [globo:article] Add support for multiple videos (#10653)
+ [thisav] Recognize HTML5 videos (#10447)
* [jwplatform] Improve JWPlayer detection
+ [mangomolo] Add support for Mangomolo embeds
+ [toutv] Add support for authentication (#10669)
* [franceinter] Fix upload date extraction
* [tv4] Fix HLS and HDS formats extraction (#10659)
version 2016.09.15
Core
* Improve _hidden_inputs
+ Introduce improved explicit Adobe Pass support
+ Add --ap-mso to provide multiple-system operator identifier
+ Add --ap-username to provide MSO account username
+ Add --ap-password to provide MSO account password
+ Add --ap-list-mso to list all supported MSOs
+ Add support for Rogers Cable multiple-system operator (#10606)
Extractors
* [crunchyroll] Fix authentication (#10655)
* [twitch] Fix API calls (#10654, #10660)
+ [bellmedia] Add support for more Bell Media Television sites
* [franceinter] Fix extraction (#10538, #2105)
* [kuwo] Improve error detection (#10650)
+ [go] Add support for free full episodes (#10439)
* [bilibili] Fix extraction for specific videos (#10647)
* [nhk] Fix extraction (#10633)
* [kaltura] Improve audio detection
* [kaltura] Skip chun format
+ [vimeo:ondemand] Pass Referer along with embed URL (#10624)
+ [nbc] Add support for NBC Olympics (#10361)
version 2016.09.11.1
Extractors
+ [tube8] Extract categories and tags (#10579)
+ [pornhub] Extract categories and tags (#10499)
* [openload] Temporary fix (#10408)
+ [foxnews] Add support Fox News articles (#10598)
* [viafree] Improve video id extraction (#10615)
* [iwara] Fix extraction after relaunch (#10462, #3215)
+ [tfo] Add extractor for tfo.org
* [lrt] Fix audio extraction (#10566)
* [9now] Fix extraction (#10561)
+ [canalplus] Add support for c8.fr (#10577)
* [newgrounds] Fix uploader extraction (#10584)
+ [polskieradio:category] Add support for category lists (#10576)
+ [ketnet] Add extractor for ketnet.be (#10343)
+ [canvas] Add support for een.be (#10605)
+ [telequebec] Add extractor for telequebec.tv (#1999)
* [parliamentliveuk] Fix extraction (#9137)
version 2016.09.08 version 2016.09.08
Extractors Extractors

View File

@@ -358,6 +358,17 @@ which means you can modify it, redistribute it or use it however you like.
-n, --netrc Use .netrc authentication data -n, --netrc Use .netrc authentication data
--video-password PASSWORD Video password (vimeo, smotri, youku) --video-password PASSWORD Video password (vimeo, smotri, youku)
## Adobe Pass Options:
--ap-mso MSO Adobe Pass multiple-system operator (TV
provider) identifier, use --ap-list-mso for
a list of available MSOs
--ap-username USERNAME Multiple-system operator account login
--ap-password PASSWORD Multiple-system operator account password.
If this option is left out, youtube-dl will
ask interactively.
--ap-list-mso List all supported multiple-system
operators
## Post-processing Options: ## Post-processing Options:
-x, --extract-audio Convert video files to audio-only files -x, --extract-audio Convert video files to audio-only files
(requires ffmpeg or avconv and ffprobe or (requires ffmpeg or avconv and ffprobe or

View File

@@ -60,6 +60,9 @@ if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; e
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi if ! python3 -c 'import wheel' 2>/dev/null; then echo 'ERROR: wheel is missing'; exit 1; fi
read -p "Is ChangeLog up to date? (y/n) " -n 1
if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1; fi
/bin/echo -e "\n### First of all, testing..." /bin/echo -e "\n### First of all, testing..."
make clean make clean
if $skip_tests ; then if $skip_tests ; then

View File

@@ -89,6 +89,7 @@
- **BeatportPro** - **BeatportPro**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
- **BellMedia**
- **Bet** - **Bet**
- **Bigflix** - **Bigflix**
- **Bild**: Bild.de - **Bild**: Bild.de
@@ -169,7 +170,6 @@
- **CSNNE** - **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **CTV**
- **CTVNews** - **CTVNews**
- **culturebox.francetvinfo.fr** - **culturebox.francetvinfo.fr**
- **CultureUnplugged** - **CultureUnplugged**
@@ -247,7 +247,8 @@
- **Formula1** - **Formula1**
- **FOX** - **FOX**
- **Foxgay** - **Foxgay**
- **FoxNews**: Fox News and Fox Business Video - **foxnews**: Fox News and Fox Business Video
- **foxnews:article**
- **foxnews:insider** - **foxnews:insider**
- **FoxSports** - **FoxSports**
- **france2.fr:generation-quoi** - **france2.fr:generation-quoi**
@@ -326,6 +327,7 @@
- **ivi**: ivi.ru - **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations - **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV - **ivideon**: Ivideon TV
- **Iwara**
- **Izlesene** - **Izlesene**
- **JeuxVideo** - **JeuxVideo**
- **Jove** - **Jove**
@@ -339,6 +341,7 @@
- **KarriereVideos** - **KarriereVideos**
- **keek** - **keek**
- **KeezMovies** - **KeezMovies**
- **Ketnet**
- **KhanAcademy** - **KhanAcademy**
- **KickStarter** - **KickStarter**
- **KonserthusetPlay** - **KonserthusetPlay**
@@ -385,6 +388,8 @@
- **mailru**: Видео@Mail.Ru - **mailru**: Видео@Mail.Ru
- **MakersChannel** - **MakersChannel**
- **MakerTV** - **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
- **MatchTV** - **MatchTV**
- **MDR**: MDR.DE and KiKA - **MDR**: MDR.DE and KiKA
- **media.ccc.de** - **media.ccc.de**
@@ -442,6 +447,7 @@
- **NBA** - **NBA**
- **NBC** - **NBC**
- **NBCNews** - **NBCNews**
- **NBCOlympics**
- **NBCSports** - **NBCSports**
- **NBCSportsVPlayer** - **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk - **ndr**: NDR.de - Norddeutscher Rundfunk
@@ -540,6 +546,7 @@
- **podomatic** - **podomatic**
- **Pokemon** - **Pokemon**
- **PolskieRadio** - **PolskieRadio**
- **PolskieRadioCategory**
- **PornCom** - **PornCom**
- **PornHd** - **PornHd**
- **PornHub**: PornHub and Thumbzilla - **PornHub**: PornHub and Thumbzilla
@@ -701,9 +708,11 @@
- **Telecinco**: telecinco.es, cuatro.com and mediaset.es - **Telecinco**: telecinco.es, cuatro.com and mediaset.es
- **Telegraaf** - **Telegraaf**
- **TeleMB** - **TeleMB**
- **TeleQuebec**
- **TeleTask** - **TeleTask**
- **Telewebion** - **Telewebion**
- **TF1** - **TF1**
- **TFO**
- **TheIntercept** - **TheIntercept**
- **ThePlatform** - **ThePlatform**
- **ThePlatformFeed** - **ThePlatformFeed**
@@ -725,7 +734,6 @@
- **ToypicsUser**: Toypics user profile - **ToypicsUser**: Toypics user profile
- **TrailerAddict** (Currently broken) - **TrailerAddict** (Currently broken)
- **Trilulilu** - **Trilulilu**
- **trollvids**
- **TruTV** - **TruTV**
- **Tube8** - **Tube8**
- **TubiTv** - **TubiTv**
@@ -843,6 +851,7 @@
- **VRT** - **VRT**
- **vube**: Vube.com - **vube**: Vube.com
- **VuClip** - **VuClip**
- **VyboryMos**
- **Walla** - **Walla**
- **washingtonpost** - **washingtonpost**
- **washingtonpost:article** - **washingtonpost:article**

View File

@@ -40,6 +40,7 @@ from youtube_dl.utils import (
js_to_json, js_to_json,
limit_length, limit_length,
mimetype2ext, mimetype2ext,
month_by_name,
ohdave_rsa_encrypt, ohdave_rsa_encrypt,
OnDemandPagedList, OnDemandPagedList,
orderedSet, orderedSet,
@@ -634,6 +635,14 @@ class TestUtil(unittest.TestCase):
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt') self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html') self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
def test_month_by_name(self):
self.assertEqual(month_by_name(None), None)
self.assertEqual(month_by_name('December', 'en'), 12)
self.assertEqual(month_by_name('décembre', 'fr'), 12)
self.assertEqual(month_by_name('December'), 12)
self.assertEqual(month_by_name('décembre'), None)
self.assertEqual(month_by_name('Unknown', 'unknown'), None)
def test_parse_codecs(self): def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {}) self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), { self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {

View File

@@ -131,6 +131,9 @@ class YoutubeDL(object):
username: Username for authentication purposes. username: Username for authentication purposes.
password: Password for authentication purposes. password: Password for authentication purposes.
videopassword: Password for accessing a video. videopassword: Password for accessing a video.
ap_mso: Adobe Pass multiple-system operator identifier.
ap_username: Multiple-system operator account username.
ap_password: Multiple-system operator account password.
usenetrc: Use netrc for authentication instead. usenetrc: Use netrc for authentication instead.
verbose: Print additional info to stdout. verbose: Print additional info to stdout.
quiet: Do not print messages to stdout. quiet: Do not print messages to stdout.

View File

@@ -34,12 +34,14 @@ from .utils import (
setproctitle, setproctitle,
std_headers, std_headers,
write_string, write_string,
render_table,
) )
from .update import update_self from .update import update_self
from .downloader import ( from .downloader import (
FileDownloader, FileDownloader,
) )
from .extractor import gen_extractors, list_extractors from .extractor import gen_extractors, list_extractors
from .extractor.adobepass import MSO_INFO
from .YoutubeDL import YoutubeDL from .YoutubeDL import YoutubeDL
@@ -118,18 +120,26 @@ def _real_main(argv=None):
desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES)) desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
write_string(desc + '\n', out=sys.stdout) write_string(desc + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
if opts.ap_list_mso:
table = [[mso_id, mso_info['name']] for mso_id, mso_info in MSO_INFO.items()]
write_string('Supported TV Providers:\n' + render_table(['mso', 'mso name'], table) + '\n', out=sys.stdout)
sys.exit(0)
# Conflicting, missing and erroneous options # Conflicting, missing and erroneous options
if opts.usenetrc and (opts.username is not None or opts.password is not None): if opts.usenetrc and (opts.username is not None or opts.password is not None):
parser.error('using .netrc conflicts with giving username/password') parser.error('using .netrc conflicts with giving username/password')
if opts.password is not None and opts.username is None: if opts.password is not None and opts.username is None:
parser.error('account username missing\n') parser.error('account username missing\n')
if opts.ap_password is not None and opts.ap_username is None:
parser.error('TV Provider account username missing\n')
if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid): if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
parser.error('using output template conflicts with using title, video ID or auto number') parser.error('using output template conflicts with using title, video ID or auto number')
if opts.usetitle and opts.useid: if opts.usetitle and opts.useid:
parser.error('using title conflicts with using video ID') parser.error('using title conflicts with using video ID')
if opts.username is not None and opts.password is None: if opts.username is not None and opts.password is None:
opts.password = compat_getpass('Type account password and press [Return]: ') opts.password = compat_getpass('Type account password and press [Return]: ')
if opts.ap_username is not None and opts.ap_password is None:
opts.ap_password = compat_getpass('Type TV provider account password and press [Return]: ')
if opts.ratelimit is not None: if opts.ratelimit is not None:
numeric_limit = FileDownloader.parse_bytes(opts.ratelimit) numeric_limit = FileDownloader.parse_bytes(opts.ratelimit)
if numeric_limit is None: if numeric_limit is None:
@@ -155,6 +165,8 @@ def _real_main(argv=None):
parser.error('max sleep interval must be greater than or equal to min sleep interval') parser.error('max sleep interval must be greater than or equal to min sleep interval')
else: else:
opts.max_sleep_interval = opts.sleep_interval opts.max_sleep_interval = opts.sleep_interval
if opts.ap_mso and opts.ap_mso not in MSO_INFO:
parser.error('Unsupported TV Provider, use --ap-list-mso to get a list of supported TV Providers')
def parse_retries(retries): def parse_retries(retries):
if retries in ('inf', 'infinite'): if retries in ('inf', 'infinite'):
@@ -293,6 +305,9 @@ def _real_main(argv=None):
'password': opts.password, 'password': opts.password,
'twofactor': opts.twofactor, 'twofactor': opts.twofactor,
'videopassword': opts.videopassword, 'videopassword': opts.videopassword,
'ap_mso': opts.ap_mso,
'ap_username': opts.ap_username,
'ap_password': opts.ap_password,
'quiet': (opts.quiet or any_getting or any_printing), 'quiet': (opts.quiet or any_getting or any_printing),
'no_warnings': opts.no_warnings, 'no_warnings': opts.no_warnings,
'forceurl': opts.geturl, 'forceurl': opts.geturl,

View File

@@ -1,7 +1,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os import os
import re
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urllib_error from ..compat import compat_urllib_error
@@ -19,34 +18,32 @@ class DashSegmentsFD(FragmentFD):
FD_NAME = 'dashsegments' FD_NAME = 'dashsegments'
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
base_url = info_dict['url'] segments = info_dict['fragments'][:1] if self.params.get(
segment_urls = [info_dict['segment_urls'][0]] if self.params.get('test', False) else info_dict['segment_urls'] 'test', False) else info_dict['fragments']
initialization_url = info_dict.get('initialization_url')
ctx = { ctx = {
'filename': filename, 'filename': filename,
'total_frags': len(segment_urls) + (1 if initialization_url else 0), 'total_frags': len(segments),
} }
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
segments_filenames = [] segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0) fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True) skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
def process_segment(segment, tmp_filename, fatal): def process_segment(segment, tmp_filename, num):
target_url, segment_name = segment segment_url = segment['url']
segment_name = 'Frag%d' % num
target_filename = '%s-%s' % (tmp_filename, segment_name) target_filename = '%s-%s' % (tmp_filename, segment_name)
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = num == 0 or not skip_unavailable_fragments
count = 0 count = 0
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)}) success = ctx['dl'].download(target_filename, {'url': segment_url})
if not success: if not success:
return False return False
down, target_sanitized = sanitize_open(target_filename, 'rb') down, target_sanitized = sanitize_open(target_filename, 'rb')
@@ -72,16 +69,8 @@ class DashSegmentsFD(FragmentFD):
return False return False
return True return True
segments_to_download = [(initialization_url, 'Init')] if initialization_url else [] for i, segment in enumerate(segments):
segments_to_download.extend([ if not process_segment(segment, ctx['tmpfilename'], i):
(segment_url, 'Seg%d' % i)
for i, segment_url in enumerate(segment_urls)])
for i, segment in enumerate(segments_to_download):
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
if not process_segment(segment, ctx['tmpfilename'], fatal):
return False return False
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@@ -13,7 +13,7 @@ from ..utils import (
class ABCIE(InfoExtractor): class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au' IE_NAME = 'abc.net.au'
_VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334', 'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',
@@ -100,6 +100,7 @@ class ABCIViewIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview' IE_NAME = 'abc.net.au:iview'
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)' _VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)'
# ABC iview programs are normally available for 14 days only.
_TESTS = [{ _TESTS = [{
'url': 'http://iview.abc.net.au/programs/gardening-australia/FA1505V024S00', 'url': 'http://iview.abc.net.au/programs/gardening-australia/FA1505V024S00',
'md5': '979d10b2939101f0d27a06b79edad536', 'md5': '979d10b2939101f0d27a06b79edad536',
@@ -112,6 +113,7 @@ class ABCIViewIE(InfoExtractor):
'uploader_id': 'abc1', 'uploader_id': 'abc1',
'timestamp': 1471719600, 'timestamp': 1471719600,
}, },
'skip': 'Video gone',
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@@ -6,16 +6,33 @@ import time
import xml.etree.ElementTree as etree import xml.etree.ElementTree as etree
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
unescapeHTML, unescapeHTML,
urlencode_postdata, urlencode_postdata,
unified_timestamp, unified_timestamp,
ExtractorError,
) )
MSO_INFO = {
'DTV': {
'name': 'DirecTV',
'username_field': 'username',
'password_field': 'password',
},
'Rogers': {
'name': 'Rogers Cable',
'username_field': 'UserName',
'password_field': 'UserPassword',
},
}
class AdobePassIE(InfoExtractor): class AdobePassIE(InfoExtractor):
_SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s' _SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s'
_USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0' _USER_AGENT = 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0'
_MVPD_CACHE = 'ap-mvpd'
@staticmethod @staticmethod
def _get_mvpd_resource(provider_id, title, guid, rating): def _get_mvpd_resource(provider_id, title, guid, rating):
@@ -41,6 +58,24 @@ class AdobePassIE(InfoExtractor):
token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(token, date_ele))) token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(token, date_ele)))
return token_expires and token_expires <= int(time.time()) return token_expires and token_expires <= int(time.time())
def post_form(form_page_res, note, data={}):
form_page, urlh = form_page_res
post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url')
if not re.match(r'https?://', post_url):
post_url = compat_urlparse.urljoin(urlh.geturl(), post_url)
form_data = self._hidden_inputs(form_page)
form_data.update(data)
return self._download_webpage_handle(
post_url, video_id, note, data=urlencode_postdata(form_data), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
def raise_mvpd_required():
raise ExtractorError(
'This video is only available for users of participating TV providers. '
'Use --ap-mso to specify Adobe Pass Multiple-system operator Identifier '
'and --ap-username and --ap-password or --netrc to provide account credentials.', expected=True)
mvpd_headers = { mvpd_headers = {
'ap_42': 'anonymous', 'ap_42': 'anonymous',
'ap_11': 'Linux i686', 'ap_11': 'Linux i686',
@@ -49,89 +84,91 @@ class AdobePassIE(InfoExtractor):
} }
guid = xml_text(resource, 'guid') guid = xml_text(resource, 'guid')
requestor_info = self._downloader.cache.load('mvpd', requestor_id) or {} count = 0
authn_token = requestor_info.get('authn_token') while count < 2:
if authn_token and is_expired(authn_token, 'simpleTokenExpires'): requestor_info = self._downloader.cache.load(self._MVPD_CACHE, requestor_id) or {}
authn_token = None authn_token = requestor_info.get('authn_token')
if not authn_token: if authn_token and is_expired(authn_token, 'simpleTokenExpires'):
# TODO add support for other TV Providers authn_token = None
mso_id = 'DTV' if not authn_token:
username, password = self._get_netrc_login_info(mso_id) # TODO add support for other TV Providers
if not username or not password: mso_id = self._downloader.params.get('ap_mso')
return '' if not mso_id:
raise_mvpd_required()
username, password = self._get_login_info('ap_username', 'ap_password', mso_id)
if not username or not password:
raise_mvpd_required()
mso_info = MSO_INFO[mso_id]
def post_form(form_page, note, data={}): provider_redirect_page_res = self._download_webpage_handle(
post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url') self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
return self._download_webpage( 'Downloading Provider Redirect Page', query={
post_url, video_id, note, data=urlencode_postdata(data or self._hidden_inputs(form_page)), headers={ 'noflash': 'true',
'Content-Type': 'application/x-www-form-urlencoded', 'mso_id': mso_id,
'requestor_id': requestor_id,
'no_iframe': 'false',
'domain_name': 'adobe.com',
'redirect_url': url,
}) })
provider_login_page_res = post_form(
provider_redirect_page = self._download_webpage( provider_redirect_page_res, 'Downloading Provider Login Page')
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id, mvpd_confirm_page_res = post_form(provider_login_page_res, 'Logging in', {
'Downloading Provider Redirect Page', query={ mso_info['username_field']: username,
'noflash': 'true', mso_info['password_field']: password,
'mso_id': mso_id,
'requestor_id': requestor_id,
'no_iframe': 'false',
'domain_name': 'adobe.com',
'redirect_url': url,
}) })
provider_login_page = post_form( if mso_id == 'DTV':
provider_redirect_page, 'Downloading Provider Login Page') post_form(mvpd_confirm_page_res, 'Confirming Login')
mvpd_confirm_page = post_form(provider_login_page, 'Logging in', {
'username': username, session = self._download_webpage(
'password': password, self._SERVICE_PROVIDER_TEMPLATE % 'session', video_id,
'Retrieving Session', data=urlencode_postdata({
'_method': 'GET',
'requestor_id': requestor_id,
}), headers=mvpd_headers)
if '<pendingLogout' in session:
self._downloader.cache.store(self._MVPD_CACHE, requestor_id, {})
count += 1
continue
authn_token = unescapeHTML(xml_text(session, 'authnToken'))
requestor_info['authn_token'] = authn_token
self._downloader.cache.store(self._MVPD_CACHE, requestor_id, requestor_info)
authz_token = requestor_info.get(guid)
if authz_token and is_expired(authz_token, 'simpleTokenTTL'):
authz_token = None
if not authz_token:
authorize = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'authorize', video_id,
'Retrieving Authorization Token', data=urlencode_postdata({
'resource_id': resource,
'requestor_id': requestor_id,
'authentication_token': authn_token,
'mso_id': xml_text(authn_token, 'simpleTokenMsoID'),
'userMeta': '1',
}), headers=mvpd_headers)
if '<pendingLogout' in authorize:
self._downloader.cache.store(self._MVPD_CACHE, requestor_id, {})
count += 1
continue
authz_token = unescapeHTML(xml_text(authorize, 'authzToken'))
requestor_info[guid] = authz_token
self._downloader.cache.store(self._MVPD_CACHE, requestor_id, requestor_info)
mvpd_headers.update({
'ap_19': xml_text(authn_token, 'simpleSamlNameID'),
'ap_23': xml_text(authn_token, 'simpleSamlSessionIndex'),
}) })
post_form(mvpd_confirm_page, 'Confirming Login')
session = self._download_webpage( short_authorize = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'session', video_id, self._SERVICE_PROVIDER_TEMPLATE % 'shortAuthorize',
'Retrieving Session', data=urlencode_postdata({ video_id, 'Retrieving Media Token', data=urlencode_postdata({
'_method': 'GET', 'authz_token': authz_token,
'requestor_id': requestor_id, 'requestor_id': requestor_id,
'session_guid': xml_text(authn_token, 'simpleTokenAuthenticationGuid'),
'hashed_guid': 'false',
}), headers=mvpd_headers) }), headers=mvpd_headers)
if '<pendingLogout' in session: if '<pendingLogout' in short_authorize:
self._downloader.cache.store('mvpd', requestor_id, {}) self._downloader.cache.store(self._MVPD_CACHE, requestor_id, {})
return self._extract_mvpd_auth(url, video_id, requestor_id, resource) count += 1
authn_token = unescapeHTML(xml_text(session, 'authnToken')) continue
requestor_info['authn_token'] = authn_token return short_authorize
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
authz_token = requestor_info.get(guid)
if authz_token and is_expired(authz_token, 'simpleTokenTTL'):
authz_token = None
if not authz_token:
authorize = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'authorize', video_id,
'Retrieving Authorization Token', data=urlencode_postdata({
'resource_id': resource,
'requestor_id': requestor_id,
'authentication_token': authn_token,
'mso_id': xml_text(authn_token, 'simpleTokenMsoID'),
'userMeta': '1',
}), headers=mvpd_headers)
if '<pendingLogout' in authorize:
self._downloader.cache.store('mvpd', requestor_id, {})
return self._extract_mvpd_auth(url, video_id, requestor_id, resource)
authz_token = unescapeHTML(xml_text(authorize, 'authzToken'))
requestor_info[guid] = authz_token
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
mvpd_headers.update({
'ap_19': xml_text(authn_token, 'simpleSamlNameID'),
'ap_23': xml_text(authn_token, 'simpleSamlSessionIndex'),
})
short_authorize = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'shortAuthorize',
video_id, 'Retrieving Media Token', data=urlencode_postdata({
'authz_token': authz_token,
'requestor_id': requestor_id,
'session_guid': xml_text(authn_token, 'simpleTokenAuthenticationGuid'),
'hashed_guid': 'false',
}), headers=mvpd_headers)
if '<pendingLogout' in short_authorize:
self._downloader.cache.store('mvpd', requestor_id, {})
return self._extract_mvpd_auth(url, video_id, requestor_id, resource)
return short_authorize

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor): class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html', 'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',

View File

@@ -50,25 +50,6 @@ class AWAANBaseIE(InfoExtractor):
'is_live': is_live, 'is_live': is_live,
} }
def _extract_video_formats(self, webpage, video_id, m3u8_entry_protocol):
formats = []
format_url_base = 'http' + self._html_search_regex(
[
r'file\s*:\s*"https?(://[^"]+)/playlist.m3u8',
r'<a[^>]+href="rtsp(://[^"]+)"'
], webpage, 'format url')
formats.extend(self._extract_mpd_formats(
format_url_base + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
formats.extend(self._extract_m3u8_formats(
format_url_base + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
formats.extend(self._extract_f4m_formats(
format_url_base + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
self._sort_formats(formats)
return formats
class AWAANVideoIE(AWAANBaseIE): class AWAANVideoIE(AWAANBaseIE):
IE_NAME = 'awaan:video' IE_NAME = 'awaan:video'
@@ -99,16 +80,18 @@ class AWAANVideoIE(AWAANBaseIE):
video_id, headers={'Origin': 'http://awaan.ae'}) video_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(video_data, video_id, False) info = self._parse_video_data(video_data, video_id, False)
webpage = self._download_webpage( embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + compat_urllib_parse_urlencode({
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + 'id': video_data['id'],
compat_urllib_parse_urlencode({ 'user_id': video_data['user_id'],
'id': video_data['id'], 'signature': video_data['signature'],
'user_id': video_data['user_id'], 'countries': 'Q0M=',
'signature': video_data['signature'], 'filter': 'DENY',
'countries': 'Q0M=', })
'filter': 'DENY', info.update({
}), video_id) '_type': 'url_transparent',
info['formats'] = self._extract_video_formats(webpage, video_id, 'm3u8_native') 'url': embed_url,
'ie_key': 'MangomoloVideo',
})
return info return info
@@ -138,16 +121,18 @@ class AWAANLiveIE(AWAANBaseIE):
channel_id, headers={'Origin': 'http://awaan.ae'}) channel_id, headers={'Origin': 'http://awaan.ae'})
info = self._parse_video_data(channel_data, channel_id, True) info = self._parse_video_data(channel_data, channel_id, True)
webpage = self._download_webpage( embed_url = 'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + compat_urllib_parse_urlencode({
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + 'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
compat_urllib_parse_urlencode({ 'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'id': base64.b64encode(channel_data['user_id'].encode()).decode(), 'signature': channel_data['signature'],
'channelid': base64.b64encode(channel_data['id'].encode()).decode(), 'countries': 'Q0M=',
'signature': channel_data['signature'], 'filter': 'DENY',
'countries': 'Q0M=', })
'filter': 'DENY', info.update({
}), channel_id) '_type': 'url_transparent',
info['formats'] = self._extract_video_formats(webpage, channel_id, 'm3u8') 'url': embed_url,
'ie_key': 'MangomoloLive',
})
return info return info

View File

@@ -103,7 +103,7 @@ class AzubuIE(InfoExtractor):
class AzubuLiveIE(InfoExtractor): class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$' _VALID_URL = r'https?://(?:www\.)?azubu\.tv/(?P<id>[^/]+)$'
_TEST = { _TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen', 'url': 'http://www.azubu.tv/MarsTVMDLen',

View File

@@ -1028,7 +1028,7 @@ class BBCIE(BBCCoUkIE):
class BBCCoUkArticleIE(InfoExtractor): class BBCCoUkArticleIE(InfoExtractor):
_VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
IE_NAME = 'bbc.co.uk:article' IE_NAME = 'bbc.co.uk:article'
IE_DESC = 'BBC articles' IE_DESC = 'BBC articles'

View File

@@ -6,8 +6,25 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
class CTVIE(InfoExtractor): class BellMediaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>ctv|tsn|bnn|thecomedynetwork)\.ca/.*?(?:\bvid=|-vid|~|%7E)(?P<id>[0-9.]+)' _VALID_URL = r'''(?x)https?://(?:www\.)?
(?P<domain>
(?:
ctv|
tsn|
bnn|
thecomedynetwork|
discovery|
discoveryvelocity|
sciencechannel|
investigationdiscovery|
animalplanet|
bravo|
mtv|
space
)\.ca|
much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6})'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966', 'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0', 'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
@@ -32,15 +49,27 @@ class CTVIE(InfoExtractor):
}, { }, {
'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009', 'url': 'http://www.ctv.ca/YourMorning/Video/S1E6-Monday-August-29-2016-vid938009',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.much.com/shows/atmidnight/episode948007/tuesday-september-13-2016',
'only_matching': True,
}, {
'url': 'http://www.much.com/shows/the-almost-impossible-gameshow/928979/episode-6',
'only_matching': True,
}] }]
_DOMAINS = {
'thecomedynetwork': 'comedy',
'discoveryvelocity': 'discvel',
'sciencechannel': 'discsci',
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
}
def _real_extract(self, url): def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups() domain, video_id = re.match(self._VALID_URL, url).groups()
if domain == 'thecomedynetwork': domain = domain.split('.')[0]
domain = 'comedy'
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'id': video_id, 'id': video_id,
'url': '9c9media:%s_web:%s' % (domain, video_id), 'url': '9c9media:%s_web:%s' % (self._DOMAINS.get(domain, domain), video_id),
'ie_key': 'NineCNineMedia', 'ie_key': 'NineCNineMedia',
} }

View File

@@ -17,7 +17,7 @@ from ..utils import (
class BiliBiliIE(InfoExtractor): class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)'
_TESTS = [{ _TEST = {
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e', 'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
'info_dict': { 'info_dict': {
@@ -32,64 +32,7 @@ class BiliBiliIE(InfoExtractor):
'uploader': '菊子桑', 'uploader': '菊子桑',
'uploader_id': '156160', 'uploader_id': '156160',
}, },
}, { }
'url': 'http://www.bilibili.com/video/av1041170/',
'info_dict': {
'id': '1041170',
'ext': 'mp4',
'title': '【BD1080P】刀语【诸神&异域】',
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
'duration': 3382.259,
'timestamp': 1396530060,
'upload_date': '20140403',
'thumbnail': 're:^https?://.+\.jpg',
'uploader': '枫叶逝去',
'uploader_id': '520116',
},
}, {
'url': 'http://www.bilibili.com/video/av4808130/',
'info_dict': {
'id': '4808130',
'ext': 'mp4',
'title': '【长篇】哆啦A梦443【钉铛】',
'description': '(2016.05.27)来组合客人的脸吧&amp;amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;amp;illust_id=56912929',
'duration': 1493.995,
'timestamp': 1464564180,
'upload_date': '20160529',
'thumbnail': 're:^https?://.+\.jpg',
'uploader': '喜欢拉面',
'uploader_id': '151066',
},
}, {
# Missing upload time
'url': 'http://www.bilibili.com/video/av1867637/',
'info_dict': {
'id': '1867637',
'ext': 'mp4',
'title': '【HDTV】【喜剧】岳父岳母真难当 2014【法国票房冠军】',
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
'duration': 5760.0,
'uploader': '黑夜为猫',
'uploader_id': '610729',
'thumbnail': 're:^https?://.+\.jpg',
},
'params': {
# Just to test metadata extraction
'skip_download': True,
},
'expected_warnings': ['upload time'],
}, {
'url': 'http://bangumi.bilibili.com/anime/v/40068',
'md5': '08d539a0884f3deb7b698fb13ba69696',
'info_dict': {
'id': '40068',
'ext': 'mp4',
'duration': 1402.357,
'title': '混沌武士 : 第7集 四面楚歌 A Risky Racket',
'description': 'md5:6a9622b911565794c11f25f81d6a97d2',
'thumbnail': 're:^http?://.+\.jpg',
},
}]
_APP_KEY = '6f90a59ac58a4123' _APP_KEY = '6f90a59ac58a4123'
_BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326' _BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326'
@@ -124,7 +67,7 @@ class BiliBiliIE(InfoExtractor):
'url': durl['url'], 'url': durl['url'],
'filesize': int_or_none(durl['size']), 'filesize': int_or_none(durl['size']),
}] }]
for backup_url in durl['backup_url']: for backup_url in durl.get('backup_url', []):
formats.append({ formats.append({
'url': backup_url, 'url': backup_url,
# backup URLs have lower priorities # backup URLs have lower priorities

View File

@@ -12,7 +12,7 @@ from ..utils import (
class BpbIE(InfoExtractor): class BpbIE(InfoExtractor):
IE_DESC = 'Bundeszentrale für politische Bildung' IE_DESC = 'Bundeszentrale für politische Bildung'
_VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/' _VALID_URL = r'https?://(?:www\.)?bpb\.de/mediathek/(?P<id>[0-9]+)/'
_TEST = { _TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr', 'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',

View File

@@ -112,7 +112,7 @@ class CamdemyIE(InfoExtractor):
class CamdemyFolderIE(InfoExtractor): class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?camdemy\.com/folder/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# links with trailing slash # links with trailing slash
'url': 'http://www.camdemy.com/folder/450', 'url': 'http://www.camdemy.com/folder/450',

View File

@@ -23,6 +23,7 @@ class CanalplusIE(InfoExtractor):
(?:(?:www|m)\.)?canalplus\.fr| (?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr| (?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv| (?:www\.)?d8\.tv|
(?:www\.)?c8\.fr|
(?:www\.)?d17\.tv| (?:www\.)?d17\.tv|
(?:www\.)?itele\.fr (?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?| )/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
@@ -35,6 +36,7 @@ class CanalplusIE(InfoExtractor):
'canalplus': 'cplus', 'canalplus': 'cplus',
'piwiplus': 'teletoon', 'piwiplus': 'teletoon',
'd8': 'd8', 'd8': 'd8',
'c8': 'd8',
'd17': 'd17', 'd17': 'd17',
'itele': 'itele', 'itele': 'itele',
} }

View File

@@ -1,11 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import float_or_none from ..utils import float_or_none
class CanvasIE(InfoExtractor): class CanvasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canvas\.be/video/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?(?P<site_id>canvas|een)\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week', 'url': 'http://www.canvas.be/video/de-afspraak/najaar-2015/de-afspraak-veilt-voor-de-warmste-week',
'md5': 'ea838375a547ac787d4064d8c7860a6c', 'md5': 'ea838375a547ac787d4064d8c7860a6c',
@@ -38,22 +40,42 @@ class CanvasIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
'info_dict': {
'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
'display_id': 'herbekijk-sorry-voor-alles',
'ext': 'mp4',
'title': 'Herbekijk Sorry voor alles',
'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 3788.06,
},
'params': {
'skip_download': True,
}
}, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
site_id, display_id = mobj.group('site_id'), mobj.group('id')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
title = self._search_regex( title = (self._search_regex(
r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>', r'<h1[^>]+class="video__body__header__title"[^>]*>(.+?)</h1>',
webpage, 'title', default=None) or self._og_search_title(webpage) webpage, 'title', default=None) or self._og_search_title(
webpage)).strip()
video_id = self._html_search_regex( video_id = self._html_search_regex(
r'data-video=(["\'])(?P<id>.+?)\1', webpage, 'video id', group='id') r'data-video=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video id', group='id')
data = self._download_json( data = self._download_json(
'https://mediazone.vrt.be/api/v1/canvas/assets/%s' % video_id, display_id) 'https://mediazone.vrt.be/api/v1/%s/assets/%s'
% (site_id, video_id), display_id)
formats = [] formats = []
for target in data['targetUrls']: for target in data['targetUrls']:

View File

@@ -4,7 +4,7 @@ from .cbs import CBSBaseIE
class CBSSportsIE(CBSBaseIE): class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://www\.cbssports\.com/video/player/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?cbssports\.com/video/player/[^/]+/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast', 'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast',

View File

@@ -17,7 +17,7 @@ from ..utils import (
class CeskaTelevizeIE(InfoExtractor): class CeskaTelevizeIE(InfoExtractor):
_VALID_URL = r'https?://www\.ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$' _VALID_URL = r'https?://(?:www\.)?ceskatelevize\.cz/(porady|ivysilani)/(?:[^/]+/)*(?P<id>[^/#?]+)/*(?:[#?].*)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220', 'url': 'http://www.ceskatelevize.cz/ivysilani/ivysilani/10441294653-hyde-park-civilizace/214411058091220',
'info_dict': { 'info_dict': {

View File

@@ -65,7 +65,7 @@ class ChirbitIE(InfoExtractor):
class ChirbitProfileIE(InfoExtractor): class ChirbitProfileIE(InfoExtractor):
IE_NAME = 'chirbit:profile' IE_NAME = 'chirbit:profile'
_VALID_URL = r'https?://(?:www\.)?chirbit.com/(?:rss/)?(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?chirbit\.com/(?:rss/)?(?P<id>[^/]+)'
_TEST = { _TEST = {
'url': 'http://chirbit.com/ScarletBeauty', 'url': 'http://chirbit.com/ScarletBeauty',
'info_dict': { 'info_dict': {

View File

@@ -6,7 +6,7 @@ from ..utils import ExtractorError
class CMTIE(MTVIE): class CMTIE(MTVIE):
IE_NAME = 'cmt.com' IE_NAME = 'cmt.com'
_VALID_URL = r'https?://www\.cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)' _VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
_FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/' _FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
_TESTS = [{ _TESTS = [{

View File

@@ -87,6 +87,9 @@ class InfoExtractor(object):
Potential fields: Potential fields:
* url Mandatory. The URL of the video file * url Mandatory. The URL of the video file
* manifest_url
The URL of the manifest file in case of
fragmented media (DASH, hls, hds)
* ext Will be calculated from URL if missing * ext Will be calculated from URL if missing
* format A human-readable description of the format * format A human-readable description of the format
("mp4 container with h264/opus"). ("mp4 container with h264/opus").
@@ -115,6 +118,11 @@ class InfoExtractor(object):
download, lower-case. download, lower-case.
"http", "https", "rtsp", "rtmp", "rtmpe", "http", "https", "rtsp", "rtmp", "rtmpe",
"m3u8", "m3u8_native" or "http_dash_segments". "m3u8", "m3u8_native" or "http_dash_segments".
* fragments A list of fragments of the fragmented media,
with the following entries:
* "url" (mandatory) - fragment's URL
* "duration" (optional, int or float)
* "filesize" (optional, int)
* preference Order number of this format. If this field is * preference Order number of this format. If this field is
present and not None, the formats get sorted present and not None, the formats get sorted
by this field, regardless of all other values. by this field, regardless of all other values.
@@ -674,33 +682,36 @@ class InfoExtractor(object):
username = info[0] username = info[0]
password = info[2] password = info[2]
else: else:
raise netrc.NetrcParseError('No authenticators for %s' % netrc_machine) raise netrc.NetrcParseError(
'No authenticators for %s' % netrc_machine)
except (IOError, netrc.NetrcParseError) as err: except (IOError, netrc.NetrcParseError) as err:
self._downloader.report_warning('parsing .netrc: %s' % error_to_compat_str(err)) self._downloader.report_warning(
'parsing .netrc: %s' % error_to_compat_str(err))
return (username, password) return username, password
def _get_login_info(self): def _get_login_info(self, username_option='username', password_option='password', netrc_machine=None):
""" """
Get the login info as (username, password) Get the login info as (username, password)
It will look in the netrc file using the _NETRC_MACHINE value First look for the manually specified credentials using username_option
and password_option as keys in params dictionary. If no such credentials
available look in the netrc file using the netrc_machine or _NETRC_MACHINE
value.
If there's no info available, return (None, None) If there's no info available, return (None, None)
""" """
if self._downloader is None: if self._downloader is None:
return (None, None) return (None, None)
username = None
password = None
downloader_params = self._downloader.params downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data # Attempt to use provided username and password or .netrc data
if downloader_params.get('username') is not None: if downloader_params.get(username_option) is not None:
username = downloader_params['username'] username = downloader_params[username_option]
password = downloader_params['password'] password = downloader_params[password_option]
else: else:
username, password = self._get_netrc_login_info() username, password = self._get_netrc_login_info(netrc_machine)
return (username, password) return username, password
def _get_tfa_info(self, note='two-factor verification code'): def _get_tfa_info(self, note='two-factor verification code'):
""" """
@@ -888,16 +899,16 @@ class InfoExtractor(object):
def _hidden_inputs(html): def _hidden_inputs(html):
html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html) html = re.sub(r'<!--(?:(?!<!--).)*-->', '', html)
hidden_inputs = {} hidden_inputs = {}
for input in re.findall(r'(?i)<input([^>]+)>', html): for input in re.findall(r'(?i)(<input[^>]+>)', html):
if not re.search(r'type=(["\'])(?:hidden|submit)\1', input): attrs = extract_attributes(input)
if not input:
continue continue
name = re.search(r'(?:name|id)=(["\'])(?P<value>.+?)\1', input) if attrs.get('type') not in ('hidden', 'submit'):
if not name:
continue continue
value = re.search(r'value=(["\'])(?P<value>.*?)\1', input) name = attrs.get('name') or attrs.get('id')
if not value: value = attrs.get('value')
continue if name and value is not None:
hidden_inputs[name.group('value')] = value.group('value') hidden_inputs[name] = value
return hidden_inputs return hidden_inputs
def _form_hidden_inputs(self, form_id, html): def _form_hidden_inputs(self, form_id, html):
@@ -1139,6 +1150,7 @@ class InfoExtractor(object):
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'url': manifest_url, 'url': manifest_url,
'manifest_url': manifest_url,
'ext': 'flv' if bootstrap_info is not None else None, 'ext': 'flv' if bootstrap_info is not None else None,
'tbr': tbr, 'tbr': tbr,
'width': width, 'width': width,
@@ -1244,9 +1256,11 @@ class InfoExtractor(object):
# format_id intact. # format_id intact.
if not live: if not live:
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats))) format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
manifest_url = format_url(line.strip())
f = { f = {
'format_id': '-'.join(format_id), 'format_id': '-'.join(format_id),
'url': format_url(line.strip()), 'url': manifest_url,
'manifest_url': manifest_url,
'tbr': tbr, 'tbr': tbr,
'ext': ext, 'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')), 'fps': float_or_none(last_info.get('FRAME-RATE')),
@@ -1518,9 +1532,10 @@ class InfoExtractor(object):
mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group() mpd_base_url = re.match(r'https?://.+/', urlh.geturl()).group()
return self._parse_mpd_formats( return self._parse_mpd_formats(
compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url, formats_dict=formats_dict) compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
formats_dict=formats_dict, mpd_url=mpd_url)
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}): def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
""" """
Parse formats from MPD manifest. Parse formats from MPD manifest.
References: References:
@@ -1541,42 +1556,52 @@ class InfoExtractor(object):
def extract_multisegment_info(element, ms_parent_info): def extract_multisegment_info(element, ms_parent_info):
ms_info = ms_parent_info.copy() ms_info = ms_parent_info.copy()
# As per [1, 5.3.9.2.2] SegmentList and SegmentTemplate share some
# common attributes and elements. We will only extract relevant
# for us.
def extract_common(source):
segment_timeline = source.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
start_number = source.get('startNumber')
if start_number:
ms_info['start_number'] = int(start_number)
timescale = source.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = source.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
def extract_Initialization(source):
initialization = source.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
segment_list = element.find(_add_ns('SegmentList')) segment_list = element.find(_add_ns('SegmentList'))
if segment_list is not None: if segment_list is not None:
extract_common(segment_list)
extract_Initialization(segment_list)
segment_urls_e = segment_list.findall(_add_ns('SegmentURL')) segment_urls_e = segment_list.findall(_add_ns('SegmentURL'))
if segment_urls_e: if segment_urls_e:
ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e] ms_info['segment_urls'] = [segment.attrib['media'] for segment in segment_urls_e]
initialization = segment_list.find(_add_ns('Initialization'))
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
else: else:
segment_template = element.find(_add_ns('SegmentTemplate')) segment_template = element.find(_add_ns('SegmentTemplate'))
if segment_template is not None: if segment_template is not None:
start_number = segment_template.get('startNumber') extract_common(segment_template)
if start_number:
ms_info['start_number'] = int(start_number)
segment_timeline = segment_template.find(_add_ns('SegmentTimeline'))
if segment_timeline is not None:
s_e = segment_timeline.findall(_add_ns('S'))
if s_e:
ms_info['total_number'] = 0
ms_info['s'] = []
for s in s_e:
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
})
else:
timescale = segment_template.get('timescale')
if timescale:
ms_info['timescale'] = int(timescale)
segment_duration = segment_template.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
media_template = segment_template.get('media') media_template = segment_template.get('media')
if media_template: if media_template:
ms_info['media_template'] = media_template ms_info['media_template'] = media_template
@@ -1584,11 +1609,14 @@ class InfoExtractor(object):
if initialization: if initialization:
ms_info['initialization_url'] = initialization ms_info['initialization_url'] = initialization
else: else:
initialization = segment_template.find(_add_ns('Initialization')) extract_Initialization(segment_template)
if initialization is not None:
ms_info['initialization_url'] = initialization.attrib['sourceURL']
return ms_info return ms_info
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration')) mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
formats = [] formats = []
for period in mpd_doc.findall(_add_ns('Period')): for period in mpd_doc.findall(_add_ns('Period')):
@@ -1631,6 +1659,7 @@ class InfoExtractor(object):
f = { f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id, 'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url, 'url': base_url,
'manifest_url': mpd_url,
'ext': mimetype2ext(mime_type), 'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')), 'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')), 'height': int_or_none(representation_attrib.get('height')),
@@ -1645,9 +1674,7 @@ class InfoExtractor(object):
} }
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info) representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info: if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float(representation_ms_info['segment_duration']) / float(representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template'] media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id) media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template) media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
@@ -1656,46 +1683,79 @@ class InfoExtractor(object):
# As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$ # As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
# can't be used at the same time # can't be used at the same time
if '%(Number' in media_template: if '%(Number' in media_template and 's' not in representation_ms_info:
representation_ms_info['segment_urls'] = [ segment_duration = None
media_template % { if 'total_number' not in representation_ms_info and 'segment_duration':
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
representation_ms_info['fragments'] = [{
'url': media_template % {
'Number': segment_number, 'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth'), 'Bandwidth': representation_attrib.get('bandwidth'),
} },
for segment_number in range( 'duration': segment_duration,
representation_ms_info['start_number'], } for segment_number in range(
representation_ms_info['total_number'] + representation_ms_info['start_number'])] representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
else: else:
representation_ms_info['segment_urls'] = [] # $Number*$ or $Time$ in media template with S list available
# Example $Number*$: http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg
# Example $Time$: https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411
representation_ms_info['fragments'] = []
segment_time = 0 segment_time = 0
segment_d = None
segment_number = representation_ms_info['start_number']
def add_segment_url(): def add_segment_url():
representation_ms_info['segment_urls'].append( segment_url = media_template % {
media_template % { 'Time': segment_time,
'Time': segment_time, 'Bandwidth': representation_attrib.get('bandwidth'),
'Bandwidth': representation_attrib.get('bandwidth'), 'Number': segment_number,
} }
) representation_ms_info['fragments'].append({
'url': segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
})
for num, s in enumerate(representation_ms_info['s']): for num, s in enumerate(representation_ms_info['s']):
segment_time = s.get('t') or segment_time segment_time = s.get('t') or segment_time
segment_d = s['d']
add_segment_url() add_segment_url()
segment_number += 1
for r in range(s.get('r', 0)): for r in range(s.get('r', 0)):
segment_time += s['d'] segment_time += segment_d
add_segment_url() add_segment_url()
segment_time += s['d'] segment_number += 1
if 'segment_urls' in representation_ms_info: segment_time += segment_d
elif 'segment_urls' in representation_ms_info and 's' in representation_ms_info:
# No media template
# Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
# or any YouTube dashsegments video
fragments = []
s_num = 0
for segment_url in representation_ms_info['segment_urls']:
s = representation_ms_info['s'][s_num]
for r in range(s.get('r', 0) + 1):
fragments.append({
'url': segment_url,
'duration': float_or_none(s['d'], representation_ms_info['timescale']),
})
representation_ms_info['fragments'] = fragments
# NB: MPD manifest may contain direct URLs to unfragmented media.
# No fragments key is present in this case.
if 'fragments' in representation_ms_info:
f.update({ f.update({
'segment_urls': representation_ms_info['segment_urls'], 'fragments': [],
'protocol': 'http_dash_segments', 'protocol': 'http_dash_segments',
}) })
if 'initialization_url' in representation_ms_info: if 'initialization_url' in representation_ms_info:
initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id) initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
f.update({
'initialization_url': initialization_url,
})
if not f.get('url'): if not f.get('url'):
f['url'] = initialization_url f['url'] = initialization_url
f['fragments'].append({'url': initialization_url})
f['fragments'].extend(representation_ms_info['fragments'])
for fragment in f['fragments']:
fragment['url'] = combine_url(base_url, fragment['url'])
try: try:
existing_format = next( existing_format = next(
fo for fo in formats fo for fo in formats
@@ -1792,6 +1852,49 @@ class InfoExtractor(object):
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
return formats return formats
def _extract_wowza_formats(self, url, video_id, m3u8_entry_protocol='m3u8_native', skip_protocols=[]):
url = re.sub(r'/(?:manifest|playlist|jwplayer)\.(?:m3u8|f4m|mpd|smil)', '', url)
url_base = self._search_regex(r'(?:https?|rtmp|rtsp)(://[^?]+)', url, 'format url')
http_base_url = 'http' + url_base
formats = []
if 'm3u8' not in skip_protocols:
formats.extend(self._extract_m3u8_formats(
http_base_url + '/playlist.m3u8', video_id, 'mp4',
m3u8_entry_protocol, m3u8_id='hls', fatal=False))
if 'f4m' not in skip_protocols:
formats.extend(self._extract_f4m_formats(
http_base_url + '/manifest.f4m',
video_id, f4m_id='hds', fatal=False))
if re.search(r'(?:/smil:|\.smil)', url_base):
if 'dash' not in skip_protocols:
formats.extend(self._extract_mpd_formats(
http_base_url + '/manifest.mpd',
video_id, mpd_id='dash', fatal=False))
if 'smil' not in skip_protocols:
rtmp_formats = self._extract_smil_formats(
http_base_url + '/jwplayer.smil',
video_id, fatal=False)
for rtmp_format in rtmp_formats:
rtsp_format = rtmp_format.copy()
rtsp_format['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
del rtsp_format['play_path']
del rtsp_format['ext']
rtsp_format.update({
'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
'protocol': 'rtsp',
})
formats.extend([rtmp_format, rtsp_format])
else:
for protocol in ('rtmp', 'rtsp'):
if protocol not in skip_protocols:
formats.append({
'url': protocol + url_base,
'format_id': protocol,
'protocol': protocol,
})
return formats
def _live_title(self, name): def _live_title(self, name):
""" Generate the title for a live video """ """ Generate the title for a live video """
now = datetime.datetime.now() now = datetime.datetime.now()

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class CriterionIE(InfoExtractor): class CriterionIE(InfoExtractor):
_VALID_URL = r'https?://www\.criterion\.com/films/(?P<id>[0-9]+)-.+' _VALID_URL = r'https?://(?:www\.)?criterion\.com/films/(?P<id>[0-9]+)-.+'
_TEST = { _TEST = {
'url': 'http://www.criterion.com/films/184-le-samourai', 'url': 'http://www.criterion.com/films/184-le-samourai',
'md5': 'bc51beba55685509883a9a7830919ec3', 'md5': 'bc51beba55685509883a9a7830919ec3',

View File

@@ -34,22 +34,51 @@ from ..aes import (
class CrunchyrollBaseIE(InfoExtractor): class CrunchyrollBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.crunchyroll.com/login'
_LOGIN_FORM = 'login_form'
_NETRC_MACHINE = 'crunchyroll' _NETRC_MACHINE = 'crunchyroll'
def _login(self): def _login(self):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
if username is None: if username is None:
return return
self.report_login()
login_url = 'https://www.crunchyroll.com/?a=formhandler' login_page = self._download_webpage(
data = urlencode_postdata({ self._LOGIN_URL, None, 'Downloading login page')
'formname': 'RpcApiUser_Login',
'name': username, login_form_str = self._search_regex(
'password': password, r'(?P<form><form[^>]+?id=(["\'])%s\2[^>]*>)' % self._LOGIN_FORM,
login_page, 'login form', group='form')
post_url = extract_attributes(login_form_str).get('action')
if not post_url:
post_url = self._LOGIN_URL
elif not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
login_form = self._form_hidden_inputs(self._LOGIN_FORM, login_page)
login_form.update({
'login_form[name]': username,
'login_form[password]': password,
}) })
login_request = sanitized_Request(login_url, data)
login_request.add_header('Content-Type', 'application/x-www-form-urlencoded') response = self._download_webpage(
self._download_webpage(login_request, None, False, 'Wrong login info') post_url, None, 'Logging in', 'Wrong login info',
data=urlencode_postdata(login_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
# Successful login
if '<title>Redirecting' in response:
return
error = self._html_search_regex(
'(?s)<ul[^>]+class=["\']messages["\'][^>]*>(.+?)</ul>',
response, 'error message', default=None)
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()

View File

@@ -6,7 +6,7 @@ from ..compat import compat_str
class DctpTvIE(InfoExtractor): class DctpTvIE(InfoExtractor):
_VALID_URL = r'https?://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$' _VALID_URL = r'https?://(?:www\.)?dctp\.tv/(#/)?filme/(?P<id>.+?)/$'
_TEST = { _TEST = {
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/', 'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'info_dict': { 'info_dict': {

View File

@@ -13,7 +13,7 @@ from ..utils import (
class DemocracynowIE(InfoExtractor): class DemocracynowIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?democracynow.org/(?P<id>[^\?]*)' _VALID_URL = r'https?://(?:www\.)?democracynow\.org/(?P<id>[^\?]*)'
IE_NAME = 'democracynow' IE_NAME = 'democracynow'
_TESTS = [{ _TESTS = [{
'url': 'http://www.democracynow.org/shows/2015/7/3', 'url': 'http://www.democracynow.org/shows/2015/7/3',

View File

@@ -4,7 +4,7 @@ from .common import InfoExtractor
class EngadgetIE(InfoExtractor): class EngadgetIE(InfoExtractor):
_VALID_URL = r'https?://www.engadget.com/video/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?engadget\.com/video/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# video with 5min ID # video with 5min ID

View File

@@ -8,7 +8,7 @@ from ..utils import (
class ExpoTVIE(InfoExtractor): class ExpoTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])' _VALID_URL = r'https?://(?:www\.)?expotv\.com/videos/[^?#]*/(?P<id>[0-9]+)($|[?#])'
_TEST = { _TEST = {
'url': 'http://www.expotv.com/videos/reviews/3/40/NYX-Butter-lipstick/667916', 'url': 'http://www.expotv.com/videos/reviews/3/40/NYX-Butter-lipstick/667916',
'md5': 'fe1d728c3a813ff78f595bc8b7a707a8', 'md5': 'fe1d728c3a813ff78f595bc8b7a707a8',

View File

@@ -93,6 +93,7 @@ from .bbc import (
) )
from .beeg import BeegIE from .beeg import BeegIE
from .behindkink import BehindKinkIE from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE
from .beatportpro import BeatportProIE from .beatportpro import BeatportProIE
from .bet import BetIE from .bet import BetIE
from .bigflix import BigflixIE from .bigflix import BigflixIE
@@ -195,7 +196,6 @@ from .crunchyroll import (
) )
from .cspan import CSpanIE from .cspan import CSpanIE
from .ctsnews import CtsNewsIE from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import ( from .curiositystream import (
@@ -293,6 +293,7 @@ from .fox import FOXIE
from .foxgay import FoxgayIE from .foxgay import FoxgayIE
from .foxnews import ( from .foxnews import (
FoxNewsIE, FoxNewsIE,
FoxNewsArticleIE,
FoxNewsInsiderIE, FoxNewsInsiderIE,
) )
from .foxsports import FoxSportsIE from .foxsports import FoxSportsIE
@@ -395,6 +396,7 @@ from .ivi import (
IviCompilationIE IviCompilationIE
) )
from .ivideon import IvideonIE from .ivideon import IvideonIE
from .iwara import IwaraIE
from .izlesene import IzleseneIE from .izlesene import IzleseneIE
from .jeuxvideo import JeuxVideoIE from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE from .jove import JoveIE
@@ -407,6 +409,7 @@ from .kankan import KankanIE
from .karaoketv import KaraoketvIE from .karaoketv import KaraoketvIE
from .karrierevideos import KarriereVideosIE from .karrierevideos import KarriereVideosIE
from .keezmovies import KeezMoviesIE from .keezmovies import KeezMoviesIE
from .ketnet import KetnetIE
from .khanacademy import KhanAcademyIE from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE from .kickstarter import KickStarterIE
from .keek import KeekIE from .keek import KeekIE
@@ -469,6 +472,10 @@ from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE from .mailru import MailRuIE
from .makerschannel import MakersChannelIE from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE from .makertv import MakerTVIE
from .mangomolo import (
MangomoloVideoIE,
MangomoloLiveIE,
)
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mdr import MDRIE from .mdr import MDRIE
from .meta import METAIE from .meta import METAIE
@@ -531,6 +538,7 @@ from .nbc import (
CSNNEIE, CSNNEIE,
NBCIE, NBCIE,
NBCNewsIE, NBCNewsIE,
NBCOlympicsIE,
NBCSportsIE, NBCSportsIE,
NBCSportsVPlayerIE, NBCSportsVPlayerIE,
) )
@@ -670,7 +678,10 @@ from .pluralsight import (
) )
from .podomatic import PodomaticIE from .podomatic import PodomaticIE
from .pokemon import PokemonIE from .pokemon import PokemonIE
from .polskieradio import PolskieRadioIE from .polskieradio import (
PolskieRadioIE,
PolskieRadioCategoryIE,
)
from .porn91 import Porn91IE from .porn91 import Porn91IE
from .porncom import PornComIE from .porncom import PornComIE
from .pornhd import PornHdIE from .pornhd import PornHdIE
@@ -861,10 +872,12 @@ from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE from .telegraaf import TelegraafIE
from .telemb import TeleMBIE from .telemb import TeleMBIE
from .telequebec import TeleQuebecIE
from .teletask import TeleTaskIE from .teletask import TeleTaskIE
from .telewebion import TelewebionIE from .telewebion import TelewebionIE
from .testurl import TestURLIE from .testurl import TestURLIE
from .tf1 import TF1IE from .tf1 import TF1IE
from .tfo import TFOIE
from .theintercept import TheInterceptIE from .theintercept import TheInterceptIE
from .theplatform import ( from .theplatform import (
ThePlatformIE, ThePlatformIE,
@@ -893,7 +906,6 @@ from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE from .trilulilu import TriluliluIE
from .trollvids import TrollvidsIE
from .trutv import TruTVIE from .trutv import TruTVIE
from .tube8 import Tube8IE from .tube8 import Tube8IE
from .tubitv import TubiTvIE from .tubitv import TubiTvIE
@@ -1057,6 +1069,7 @@ from .vporn import VpornIE
from .vrt import VRTIE from .vrt import VRTIE
from .vube import VubeIE from .vube import VubeIE
from .vuclip import VuClipIE from .vuclip import VuClipIE
from .vyborymos import VyboryMosIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import ( from .washingtonpost import (
WashingtonPostIE, WashingtonPostIE,

View File

@@ -7,6 +7,7 @@ from .common import InfoExtractor
class FoxNewsIE(AMPIE): class FoxNewsIE(AMPIE):
IE_NAME = 'foxnews'
IE_DESC = 'Fox News and Fox Business Video' IE_DESC = 'Fox News and Fox Business Video'
_VALID_URL = r'https?://(?P<host>video\.(?:insider\.)?fox(?:news|business)\.com)/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)' _VALID_URL = r'https?://(?P<host>video\.(?:insider\.)?fox(?:news|business)\.com)/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)'
_TESTS = [ _TESTS = [
@@ -66,6 +67,35 @@ class FoxNewsIE(AMPIE):
return info return info
class FoxNewsArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxnews\.com/(?!v)([^/]+/)+(?P<id>[a-z-]+)'
IE_NAME = 'foxnews:article'
_TEST = {
'url': 'http://www.foxnews.com/politics/2016/09/08/buzz-about-bud-clinton-camp-denies-claims-wore-earpiece-at-forum.html',
'md5': '62aa5a781b308fdee212ebb6f33ae7ef',
'info_dict': {
'id': '5116295019001',
'ext': 'mp4',
'title': 'Trump and Clinton asked to defend positions on Iraq War',
'description': 'Veterans react on \'The Kelly File\'',
'timestamp': 1473299755,
'upload_date': '20160908',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex(
r'data-video-id=([\'"])(?P<id>[^\'"]+)\1',
webpage, 'video ID', group='id')
return self.url_result(
'http://video.foxnews.com/v/' + video_id,
FoxNewsIE.ie_key())
class FoxNewsInsiderIE(InfoExtractor): class FoxNewsInsiderIE(InfoExtractor):
_VALID_URL = r'https?://insider\.foxnews\.com/([^/]+/)+(?P<id>[a-z-]+)' _VALID_URL = r'https?://insider\.foxnews\.com/([^/]+/)+(?P<id>[a-z-]+)'
IE_NAME = 'foxnews:insider' IE_NAME = 'foxnews:insider'
@@ -83,6 +113,10 @@ class FoxNewsInsiderIE(InfoExtractor):
'upload_date': '20160825', 'upload_date': '20160825',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
}, },
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': [FoxNewsIE.ie_key()], 'add_ie': [FoxNewsIE.ie_key()],
} }

View File

@@ -2,21 +2,21 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..utils import month_by_name
class FranceInterIE(InfoExtractor): class FranceInterIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?franceinter\.fr/emissions/(?P<id>[^?#]+)'
_TEST = { _TEST = {
'url': 'http://www.franceinter.fr/player/reecouter?play=793962', 'url': 'https://www.franceinter.fr/emissions/affaires-sensibles/affaires-sensibles-07-septembre-2016',
'md5': '4764932e466e6f6c79c317d2e74f6884', 'md5': '9e54d7bdb6fdc02a841007f8a975c094',
'info_dict': { 'info_dict': {
'id': '793962', 'id': 'affaires-sensibles/affaires-sensibles-07-septembre-2016',
'ext': 'mp3', 'ext': 'mp3',
'title': 'LHistoire dans les jeux vidéo', 'title': 'Affaire Cahuzac : le contentieux du compte en Suisse',
'description': 'md5:7e93ddb4451e7530022792240a3049c7', 'description': 'md5:401969c5d318c061f86bda1fa359292b',
'timestamp': 1387369800, 'upload_date': '20160907',
'upload_date': '20131218',
}, },
} }
@@ -25,23 +25,30 @@ class FranceInterIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
path = self._search_regex( video_url = self._search_regex(
r'<a id="player".+?href="([^"]+)"', webpage, 'video url') r'(?s)<div[^>]+class=["\']page-diffusion["\'][^>]*>.*?<button[^>]+data-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
video_url = 'http://www.franceinter.fr/' + path webpage, 'video url', group='url')
title = self._html_search_regex( title = self._og_search_title(webpage)
r'<span class="title-diffusion">(.+?)</span>', webpage, 'title') description = self._og_search_description(webpage)
description = self._html_search_regex(
r'<span class="description">(.*?)</span>', upload_date_str = self._search_regex(
webpage, 'description', fatal=False) r'class=["\']cover-emission-period["\'][^>]*>[^<]+\s+(\d{1,2}\s+[^\s]+\s+\d{4})<',
timestamp = int_or_none(self._search_regex( webpage, 'upload date', fatal=False)
r'data-date="(\d+)"', webpage, 'upload date', fatal=False)) if upload_date_str:
upload_date_list = upload_date_str.split()
upload_date_list.reverse()
upload_date_list[1] = '%02d' % (month_by_name(upload_date_list[1], lang='fr') or 0)
upload_date_list[2] = '%02d' % int(upload_date_list[2])
upload_date = ''.join(upload_date_list)
else:
upload_date = None
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'timestamp': timestamp, 'upload_date': upload_date,
'formats': [{ 'formats': [{
'url': video_url, 'url': video_url,
'vcodec': 'none', 'vcodec': 'none',

View File

@@ -8,7 +8,7 @@ from .common import InfoExtractor
class FreespeechIE(InfoExtractor): class FreespeechIE(InfoExtractor):
IE_NAME = 'freespeech.org' IE_NAME = 'freespeech.org'
_VALID_URL = r'https://www\.freespeech\.org/video/(?P<title>.+)' _VALID_URL = r'https?://(?:www\.)?freespeech\.org/video/(?P<title>.+)'
_TEST = { _TEST = {
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0', 'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0',

View File

@@ -9,7 +9,7 @@ from ..utils import (
class GameStarIE(InfoExtractor): class GameStarIE(InfoExtractor):
_VALID_URL = r'https?://www\.gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html' _VALID_URL = r'https?://(?:www\.)?gamestar\.de/videos/.*,(?P<id>[0-9]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html', 'url': 'http://www.gamestar.de/videos/trailer,3/hobbit-3-die-schlacht-der-fuenf-heere,76110.html',
'md5': '96974ecbb7fd8d0d20fca5a00810cea7', 'md5': '96974ecbb7fd8d0d20fca5a00810cea7',

View File

@@ -1369,6 +1369,11 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': ['Vimeo'], 'add_ie': ['Vimeo'],
}, },
{
# generic vimeo embed that requires original URL passed as Referer
'url': 'http://racing4everyone.eu/2016/07/30/formula-1-2016-round12-germany/',
'only_matching': True,
},
{ {
'url': 'https://support.arkena.com/display/PLAY/Ways+to+embed+your+video', 'url': 'https://support.arkena.com/display/PLAY/Ways+to+embed+your+video',
'md5': 'b96f2f71b359a8ecd05ce4e1daa72365', 'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
@@ -1652,7 +1657,9 @@ class GenericIE(InfoExtractor):
return self.playlist_result(self._parse_xspf(doc, video_id), video_id) return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag): elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats( info_dict['formats'] = self._parse_mpd_formats(
doc, video_id, mpd_base_url=url.rpartition('/')[0]) doc, video_id,
mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_url=url)
self._sort_formats(info_dict['formats']) self._sort_formats(info_dict['formats'])
return info_dict return info_dict
elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag): elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag):
@@ -2249,6 +2256,35 @@ class GenericIE(InfoExtractor):
return self.url_result( return self.url_result(
self._proto_relative_url(unescapeHTML(mobj.group('url'))), 'VODPlatform') self._proto_relative_url(unescapeHTML(mobj.group('url'))), 'VODPlatform')
# Look for Mangomolo embeds
mobj = re.search(
r'''(?x)<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?admin\.mangomolo\.com/analytics/index\.php/customers/embed/
(?:
video\?.*?\bid=(?P<video_id>\d+)|
index\?.*?\bchannelid=(?P<channel_id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)
).+?)\1''', webpage)
if mobj is not None:
info = {
'_type': 'url_transparent',
'url': self._proto_relative_url(unescapeHTML(mobj.group('url'))),
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'uploader': video_uploader,
}
video_id = mobj.group('video_id')
if video_id:
info.update({
'ie_key': 'MangomoloVideo',
'id': video_id,
})
else:
info.update({
'ie_key': 'MangomoloLive',
'id': mobj.group('channel_id'),
})
return info
# Look for Instagram embeds # Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage) instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None: if instagram_embed_url is not None:

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import random import random
import re
import math import math
from .common import InfoExtractor from .common import InfoExtractor
@@ -14,6 +15,7 @@ from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
orderedSet,
str_or_none, str_or_none,
) )
@@ -63,6 +65,9 @@ class GloboIE(InfoExtractor):
}, { }, {
'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html', 'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html',
'only_matching': True, 'only_matching': True,
}, {
'url': 'globo:3607726',
'only_matching': True,
}] }]
class MD5(object): class MD5(object):
@@ -396,7 +401,7 @@ class GloboIE(InfoExtractor):
class GloboArticleIE(InfoExtractor): class GloboArticleIE(InfoExtractor):
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)(?:\.html)?' _VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/.]+)(?:\.html)?'
_VIDEOID_REGEXES = [ _VIDEOID_REGEXES = [
r'\bdata-video-id=["\'](\d{7,})', r'\bdata-video-id=["\'](\d{7,})',
@@ -408,15 +413,20 @@ class GloboArticleIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html', 'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
'info_dict': { 'info_dict': {
'id': '3652183', 'id': 'novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes',
'ext': 'mp4', 'title': 'Novidade na fiscalização de bagagem pela Receita provoca discussões',
'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião', 'description': 'md5:c3c4b4d4c30c32fce460040b1ac46b12',
'duration': 110.711, },
'uploader': 'Rede Globo', 'playlist_count': 1,
'uploader_id': '196', }, {
} 'url': 'http://g1.globo.com/pr/parana/noticia/2016/09/mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato.html',
'info_dict': {
'id': 'mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato',
'title': "Lula era o 'comandante máximo' do esquema da Lava Jato, diz MPF",
'description': 'md5:8aa7cc8beda4dc71cc8553e00b77c54c',
},
'playlist_count': 6,
}, { }, {
'url': 'http://gq.globo.com/Prazeres/Poder/noticia/2015/10/all-o-desafio-assista-ao-segundo-capitulo-da-serie.html', 'url': 'http://gq.globo.com/Prazeres/Poder/noticia/2015/10/all-o-desafio-assista-ao-segundo-capitulo-da-serie.html',
'only_matching': True, 'only_matching': True,
@@ -435,5 +445,12 @@ class GloboArticleIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id') video_ids = []
return self.url_result('globo:%s' % video_id, 'Globo') for video_regex in self._VIDEOID_REGEXES:
video_ids.extend(re.findall(video_regex, webpage))
entries = [
self.url_result('globo:%s' % video_id, GloboIE.ie_key())
for video_id in orderedSet(video_ids)]
title = self._og_search_title(webpage, fatal=False)
description = self._html_search_meta('description', webpage)
return self.playlist_result(entries, display_id, title, description)

View File

@@ -8,6 +8,8 @@ from ..utils import (
int_or_none, int_or_none,
determine_ext, determine_ext,
parse_age_limit, parse_age_limit,
urlencode_postdata,
ExtractorError,
) )
@@ -19,7 +21,7 @@ class GoIE(InfoExtractor):
'watchdisneyjunior': '008', 'watchdisneyjunior': '008',
'watchdisneyxd': '009', 'watchdisneyxd': '009',
} }
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/.*?vdka(?P<id>\w+)' % '|'.join(_BRANDS.keys()) _VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:[^/]+/)*(?:vdka(?P<id>\w+)|season-\d+/\d+-(?P<display_id>[^/?#]+))' % '|'.join(_BRANDS.keys())
_TESTS = [{ _TESTS = [{
'url': 'http://abc.go.com/shows/castle/video/most-recent/vdka0_g86w5onx', 'url': 'http://abc.go.com/shows/castle/video/most-recent/vdka0_g86w5onx',
'info_dict': { 'info_dict': {
@@ -38,9 +40,13 @@ class GoIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
sub_domain, video_id = re.match(self._VALID_URL, url).groups() sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(r'data-video-id=["\']VDKA(\w+)', webpage, 'video id')
brand = self._BRANDS[sub_domain]
video_data = self._download_json( video_data = self._download_json(
'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (self._BRANDS[sub_domain], video_id), 'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (brand, video_id),
video_id)['video'][0] video_id)['video'][0]
title = video_data['title'] title = video_data['title']
@@ -52,6 +58,21 @@ class GoIE(InfoExtractor):
format_id = asset.get('format') format_id = asset.get('format')
ext = determine_ext(asset_url) ext = determine_ext(asset_url)
if ext == 'm3u8': if ext == 'm3u8':
video_type = video_data.get('type')
if video_type == 'lf':
entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
video_id, data=urlencode_postdata({
'video_id': video_data['id'],
'video_type': video_type,
'brand': brand,
'device': '001',
}))
errors = entitlement.get('errors', {}).get('errors', [])
if errors:
error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
asset_url += '?' + entitlement['uplynkData']['sessionKey']
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False)) asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False))
else: else:

View File

@@ -10,7 +10,7 @@ from ..utils import unified_strdate
class GooglePlusIE(InfoExtractor): class GooglePlusIE(InfoExtractor):
IE_DESC = 'Google Plus' IE_DESC = 'Google Plus'
_VALID_URL = r'https://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)' _VALID_URL = r'https?://plus\.google\.com/(?:[^/]+/)*?posts/(?P<id>\w+)'
IE_NAME = 'plus.google' IE_NAME = 'plus.google'
_TEST = { _TEST = {
'url': 'https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH', 'url': 'https://plus.google.com/u/0/108897254135232129896/posts/ZButuJc6CtH',

View File

@@ -11,7 +11,7 @@ from ..utils import (
class GoshgayIE(InfoExtractor): class GoshgayIE(InfoExtractor):
_VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)' _VALID_URL = r'https?://(?:www\.)?goshgay\.com/video(?P<id>\d+?)($|/)'
_TEST = { _TEST = {
'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video', 'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
'md5': '4b6db9a0a333142eb9f15913142b0ed1', 'md5': '4b6db9a0a333142eb9f15913142b0ed1',

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class HarkIE(InfoExtractor): class HarkIE(InfoExtractor):
_VALID_URL = r'https?://www\.hark\.com/clips/(?P<id>.+?)-.+' _VALID_URL = r'https?://(?:www\.)?hark\.com/clips/(?P<id>.+?)-.+'
_TEST = { _TEST = {
'url': 'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013', 'url': 'http://www.hark.com/clips/mmbzyhkgny-obama-beyond-the-afghan-theater-we-only-target-al-qaeda-on-may-23-2013',
'md5': '6783a58491b47b92c7c1af5a77d4cbee', 'md5': '6783a58491b47b92c7c1af5a77d4cbee',

View File

@@ -12,7 +12,7 @@ from ..utils import (
class HotNewHipHopIE(InfoExtractor): class HotNewHipHopIE(InfoExtractor):
_VALID_URL = r'https?://www\.hotnewhiphop\.com/.*\.(?P<id>.*)\.html' _VALID_URL = r'https?://(?:www\.)?hotnewhiphop\.com/.*\.(?P<id>.*)\.html'
_TEST = { _TEST = {
'url': 'http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html', 'url': 'http://www.hotnewhiphop.com/freddie-gibbs-lay-it-down-song.1435540.html',
'md5': '2c2cd2f76ef11a9b3b581e8b232f3d96', 'md5': '2c2cd2f76ef11a9b3b581e8b232f3d96',

View File

@@ -94,7 +94,7 @@ class ImdbIE(InfoExtractor):
class ImdbListIE(InfoExtractor): class ImdbListIE(InfoExtractor):
IE_NAME = 'imdb:list' IE_NAME = 'imdb:list'
IE_DESC = 'Internet Movie Database lists' IE_DESC = 'Internet Movie Database lists'
_VALID_URL = r'https?://www\.imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})' _VALID_URL = r'https?://(?:www\.)?imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})'
_TEST = { _TEST = {
'url': 'http://www.imdb.com/list/JFs9NWw6XI0', 'url': 'http://www.imdb.com/list/JFs9NWw6XI0',
'info_dict': { 'info_dict': {

View File

@@ -0,0 +1,77 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import remove_end
class IwaraIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.|ecchi\.)?iwara\.tv/videos/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'http://iwara.tv/videos/amVwUl1EHpAD9RD',
'md5': '1d53866b2c514b23ed69e4352fdc9839',
'info_dict': {
'id': 'amVwUl1EHpAD9RD',
'ext': 'mp4',
'title': '【MMD R-18】ガールフレンド carry_me_off',
'age_limit': 18,
},
}, {
'url': 'http://ecchi.iwara.tv/videos/Vb4yf2yZspkzkBO',
'md5': '7e5f1f359cd51a027ba4a7b7710a50f0',
'info_dict': {
'id': '0B1LvuHnL-sRFNXB1WHNqbGw4SXc',
'ext': 'mp4',
'title': '[3D Hentai] Kyonyu Ã\x97 Genkai Ã\x97 Emaki Shinobi Girls.mp4',
'age_limit': 18,
},
'add_ie': ['GoogleDrive'],
}, {
'url': 'http://www.iwara.tv/videos/nawkaumd6ilezzgq',
'md5': '1d85f1e5217d2791626cff5ec83bb189',
'info_dict': {
'id': '6liAP9s2Ojc',
'ext': 'mp4',
'age_limit': 0,
'title': '[MMD] Do It Again Ver.2 [1080p 60FPS] (Motion,Camera,Wav+DL)',
'description': 'md5:590c12c0df1443d833fbebe05da8c47a',
'upload_date': '20160910',
'uploader': 'aMMDsork',
'uploader_id': 'UCVOFyOSCyFkXTYYHITtqB7A',
},
'add_ie': ['Youtube'],
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage, urlh = self._download_webpage_handle(url, video_id)
hostname = compat_urllib_parse_urlparse(urlh.geturl()).hostname
# ecchi is 'sexy' in Japanese
age_limit = 18 if hostname.split('.')[0] == 'ecchi' else 0
entries = self._parse_html5_media_entries(url, webpage, video_id)
if not entries:
iframe_url = self._html_search_regex(
r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1',
webpage, 'iframe URL', group='url')
return {
'_type': 'url_transparent',
'url': iframe_url,
'age_limit': age_limit,
}
title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), ' | Iwara')
info_dict = entries[0]
info_dict.update({
'id': video_id,
'title': title,
'age_limit': age_limit,
})
return info_dict

View File

@@ -9,6 +9,7 @@ from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
js_to_json,
mimetype2ext, mimetype2ext,
) )
@@ -19,14 +20,15 @@ class JWPlatformBaseIE(InfoExtractor):
# TODO: Merge this with JWPlayer-related codes in generic.py # TODO: Merge this with JWPlayer-related codes in generic.py
mobj = re.search( mobj = re.search(
'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\((?P<options>[^)]+)\)', r'jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)\.setup\s*\((?P<options>[^)]+)\)',
webpage) webpage)
if mobj: if mobj:
return mobj.group('options') return mobj.group('options')
def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs): def _extract_jwplayer_data(self, webpage, video_id, *args, **kwargs):
jwplayer_data = self._parse_json( jwplayer_data = self._parse_json(
self._find_jwplayer_data(webpage), video_id) self._find_jwplayer_data(webpage), video_id,
transform_source=js_to_json)
return self._parse_jwplayer_data( return self._parse_jwplayer_data(
jwplayer_data, video_id, *args, **kwargs) jwplayer_data, video_id, *args, **kwargs)

View File

@@ -262,8 +262,16 @@ class KalturaIE(InfoExtractor):
# Continue if asset is not ready # Continue if asset is not ready
if f.get('status') != 2: if f.get('status') != 2:
continue continue
# Original format that's not available (e.g. kaltura:1926081:0_c03e1b5g)
# skip for now.
if f.get('fileExt') == 'chun':
continue
video_url = sign_url( video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id'])) '%s/flavorId/%s' % (data_url, f['id']))
# audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
# -f mp4-56)
vcodec = 'none' if 'videoCodecId' not in f and f.get(
'frameRate') == 0 else f.get('videoCodecId')
formats.append({ formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f, 'format_id': '%(fileExt)s-%(bitrate)s' % f,
'ext': f.get('fileExt'), 'ext': f.get('fileExt'),
@@ -271,7 +279,7 @@ class KalturaIE(InfoExtractor):
'fps': int_or_none(f.get('frameRate')), 'fps': int_or_none(f.get('frameRate')),
'filesize_approx': int_or_none(f.get('size'), invscale=1024), 'filesize_approx': int_or_none(f.get('size'), invscale=1024),
'container': f.get('containerFormat'), 'container': f.get('containerFormat'),
'vcodec': f.get('videoCodecId'), 'vcodec': vcodec,
'height': int_or_none(f.get('height')), 'height': int_or_none(f.get('height')),
'width': int_or_none(f.get('width')), 'width': int_or_none(f.get('width')),
'url': video_url, 'url': video_url,

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class KaraoketvIE(InfoExtractor): class KaraoketvIE(InfoExtractor):
_VALID_URL = r'https?://www\.karaoketv\.co\.il/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?karaoketv\.co\.il/[^/]+/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F', 'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
'info_dict': { 'info_dict': {

View File

@@ -0,0 +1,52 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class KetnetIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ketnet\.be/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.ketnet.be/kijken/zomerse-filmpjes',
'md5': 'd907f7b1814ef0fa285c0475d9994ed7',
'info_dict': {
'id': 'zomerse-filmpjes',
'ext': 'mp4',
'title': 'Gluur mee op de filmset en op Pennenzakkenrock',
'description': 'Gluur mee met Ghost Rockers op de filmset',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'url': 'https://www.ketnet.be/kijken/karrewiet/uitzending-8-september-2016',
'only_matching': True,
}, {
'url': 'https://www.ketnet.be/achter-de-schermen/sien-repeteert-voor-stars-for-life',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
config = self._parse_json(
self._search_regex(
r'(?s)playerConfig\s*=\s*({.+?})\s*;', webpage,
'player config'),
video_id)
title = config['title']
formats = self._extract_m3u8_formats(
config['source']['hls'], video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': config.get('description'),
'thumbnail': config.get('image'),
'series': config.get('program'),
'episode': config.get('episode'),
'formats': formats,
}

View File

@@ -6,7 +6,7 @@ from ..utils import smuggle_url
class KickStarterIE(InfoExtractor): class KickStarterIE(InfoExtractor):
_VALID_URL = r'https?://www\.kickstarter\.com/projects/(?P<id>[^/]*)/.*' _VALID_URL = r'https?://(?:www\.)?kickstarter\.com/projects/(?P<id>[^/]*)/.*'
_TESTS = [{ _TESTS = [{
'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant/description', 'url': 'https://www.kickstarter.com/projects/1404461844/intersection-the-story-of-josh-grant/description',
'md5': 'c81addca81327ffa66c642b5d8b08cab', 'md5': 'c81addca81327ffa66c642b5d8b08cab',

View File

@@ -59,7 +59,7 @@ class KuwoBaseIE(InfoExtractor):
class KuwoIE(KuwoBaseIE): class KuwoIE(KuwoBaseIE):
IE_NAME = 'kuwo:song' IE_NAME = 'kuwo:song'
IE_DESC = '酷我音乐' IE_DESC = '酷我音乐'
_VALID_URL = r'https?://www\.kuwo\.cn/yinyue/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/yinyue/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.kuwo.cn/yinyue/635632/', 'url': 'http://www.kuwo.cn/yinyue/635632/',
'info_dict': { 'info_dict': {
@@ -82,7 +82,7 @@ class KuwoIE(KuwoBaseIE):
'upload_date': '20150518', 'upload_date': '20150518',
}, },
'params': { 'params': {
'format': 'mp3-320' 'format': 'mp3-320',
}, },
}, { }, {
'url': 'http://www.kuwo.cn/yinyue/3197154?catalog=yueku2016', 'url': 'http://www.kuwo.cn/yinyue/3197154?catalog=yueku2016',
@@ -91,10 +91,10 @@ class KuwoIE(KuwoBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
song_id = self._match_id(url) song_id = self._match_id(url)
webpage = self._download_webpage( webpage, urlh = self._download_webpage_handle(
url, song_id, note='Download song detail info', url, song_id, note='Download song detail info',
errnote='Unable to get song detail info') errnote='Unable to get song detail info')
if '对不起,该歌曲由于版权问题已被下线,将返回网站首页' in webpage: if song_id not in urlh.geturl() or '对不起,该歌曲由于版权问题已被下线,将返回网站首页' in webpage:
raise ExtractorError('this song has been offline because of copyright issues', expected=True) raise ExtractorError('this song has been offline because of copyright issues', expected=True)
song_name = self._html_search_regex( song_name = self._html_search_regex(
@@ -139,7 +139,7 @@ class KuwoIE(KuwoBaseIE):
class KuwoAlbumIE(InfoExtractor): class KuwoAlbumIE(InfoExtractor):
IE_NAME = 'kuwo:album' IE_NAME = 'kuwo:album'
IE_DESC = '酷我音乐 - 专辑' IE_DESC = '酷我音乐 - 专辑'
_VALID_URL = r'https?://www\.kuwo\.cn/album/(?P<id>\d+?)/' _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/album/(?P<id>\d+?)/'
_TEST = { _TEST = {
'url': 'http://www.kuwo.cn/album/502294/', 'url': 'http://www.kuwo.cn/album/502294/',
'info_dict': { 'info_dict': {
@@ -181,7 +181,7 @@ class KuwoChartIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '香港中文龙虎榜', 'id': '香港中文龙虎榜',
}, },
'playlist_mincount': 10, 'playlist_mincount': 7,
} }
def _real_extract(self, url): def _real_extract(self, url):
@@ -200,7 +200,7 @@ class KuwoChartIE(InfoExtractor):
class KuwoSingerIE(InfoExtractor): class KuwoSingerIE(InfoExtractor):
IE_NAME = 'kuwo:singer' IE_NAME = 'kuwo:singer'
IE_DESC = '酷我音乐 - 歌手' IE_DESC = '酷我音乐 - 歌手'
_VALID_URL = r'https?://www\.kuwo\.cn/mingxing/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/mingxing/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.kuwo.cn/mingxing/bruno+mars/', 'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
'info_dict': { 'info_dict': {
@@ -296,14 +296,14 @@ class KuwoCategoryIE(InfoExtractor):
class KuwoMvIE(KuwoBaseIE): class KuwoMvIE(KuwoBaseIE):
IE_NAME = 'kuwo:mv' IE_NAME = 'kuwo:mv'
IE_DESC = '酷我音乐 - MV' IE_DESC = '酷我音乐 - MV'
_VALID_URL = r'https?://www\.kuwo\.cn/mv/(?P<id>\d+?)/' _VALID_URL = r'https?://(?:www\.)?kuwo\.cn/mv/(?P<id>\d+?)/'
_TEST = { _TEST = {
'url': 'http://www.kuwo.cn/mv/6480076/', 'url': 'http://www.kuwo.cn/mv/6480076/',
'info_dict': { 'info_dict': {
'id': '6480076', 'id': '6480076',
'ext': 'mp4', 'ext': 'mp4',
'title': 'My HouseMV', 'title': 'My HouseMV',
'creator': 'PM02:00', 'creator': '2PM',
}, },
# In this video, music URLs (anti.s) are blocked outside China and # In this video, music URLs (anti.s) are blocked outside China and
# USA, while the MV URL (mvurl) is available globally, so force the MV # USA, while the MV URL (mvurl) is available globally, so force the MV

View File

@@ -14,7 +14,7 @@ from ..utils import (
class LiTVIE(InfoExtractor): class LiTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.litv\.tv/(?:vod|promo)/[^/]+/(?:content\.do)?\?.*?\b(?:content_)?id=(?P<id>[^&]+)' _VALID_URL = r'https?://(?:www\.)?litv\.tv/(?:vod|promo)/[^/]+/(?:content\.do)?\?.*?\b(?:content_)?id=(?P<id>[^&]+)'
_URL_TEMPLATE = 'https://www.litv.tv/vod/%s/content.do?id=%s' _URL_TEMPLATE = 'https://www.litv.tv/vod/%s/content.do?id=%s'

View File

@@ -1,8 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
int_or_none, int_or_none,
parse_duration, parse_duration,
remove_end, remove_end,
@@ -12,8 +15,10 @@ from ..utils import (
class LRTIE(InfoExtractor): class LRTIE(InfoExtractor):
IE_NAME = 'lrt.lt' IE_NAME = 'lrt.lt'
_VALID_URL = r'https?://(?:www\.)?lrt\.lt/mediateka/irasas/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?lrt\.lt/mediateka/irasas/(?P<id>[0-9]+)'
_TEST = { _TESTS = [{
# m3u8 download
'url': 'http://www.lrt.lt/mediateka/irasas/54391/', 'url': 'http://www.lrt.lt/mediateka/irasas/54391/',
'md5': 'fe44cf7e4ab3198055f2c598fc175cb0',
'info_dict': { 'info_dict': {
'id': '54391', 'id': '54391',
'ext': 'mp4', 'ext': 'mp4',
@@ -23,20 +28,45 @@ class LRTIE(InfoExtractor):
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
}, },
'params': { }, {
'skip_download': True, # m3u8 download # direct mp3 download
'url': 'http://www.lrt.lt/mediateka/irasas/1013074524/',
'md5': '389da8ca3cad0f51d12bed0c844f6a0a',
'info_dict': {
'id': '1013074524',
'ext': 'mp3',
'title': 'Kita tema 2016-09-05 15:05',
'description': 'md5:1b295a8fc7219ed0d543fc228c931fb5',
'duration': 3008,
'view_count': int,
'like_count': int,
}, },
} }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' - LRT') title = remove_end(self._og_search_title(webpage), ' - LRT')
m3u8_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*location\.hash\.substring\(1\)', formats = []
webpage, 'm3u8 url', group='url') for _, file_url in re.findall(
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4') r'file\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage):
ext = determine_ext(file_url)
if ext not in ('m3u8', 'mp3'):
continue
# mp3 served as m3u8 produces stuttered media file
if ext == 'm3u8' and '.mp3' in file_url:
continue
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
file_url, video_id, 'mp4', entry_protocol='m3u8_native',
fatal=False))
elif ext == 'mp3':
formats.append({
'url': file_url,
'vcodec': 'none',
})
self._sort_formats(formats) self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)

View File

@@ -94,7 +94,7 @@ class LyndaBaseIE(InfoExtractor):
class LyndaIE(LyndaBaseIE): class LyndaIE(LyndaBaseIE):
IE_NAME = 'lynda' IE_NAME = 'lynda'
IE_DESC = 'lynda.com videos' IE_DESC = 'lynda.com videos'
_VALID_URL = r'https?://www\.lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?lynda\.com/(?:[^/]+/[^/]+/\d+|player/embed)/(?P<id>\d+)'
_TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]' _TIMECODE_REGEX = r'\[(?P<timecode>\d+:\d+:\d+[\.,]\d+)\]'

View File

@@ -7,7 +7,7 @@ from ..utils import ExtractorError
class MacGameStoreIE(InfoExtractor): class MacGameStoreIE(InfoExtractor):
IE_NAME = 'macgamestore' IE_NAME = 'macgamestore'
IE_DESC = 'MacGameStore trailers' IE_DESC = 'MacGameStore trailers'
_VALID_URL = r'https?://www\.macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?macgamestore\.com/mediaviewer\.php\?trailer=(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.macgamestore.com/mediaviewer.php?trailer=2450', 'url': 'http://www.macgamestore.com/mediaviewer.php?trailer=2450',

View File

@@ -0,0 +1,54 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
)
class MangomoloBaseIE(InfoExtractor):
def _get_real_id(self, page_id):
return page_id
def _real_extract(self, url):
page_id = self._get_real_id(self._match_id(url))
webpage = self._download_webpage(url, page_id)
hidden_inputs = self._hidden_inputs(webpage)
m3u8_entry_protocol = 'm3u8' if self._IS_LIVE else 'm3u8_native'
format_url = self._html_search_regex(
[
r'file\s*:\s*"(https?://[^"]+?/playlist.m3u8)',
r'<a[^>]+href="(rtsp://[^"]+)"'
], webpage, 'format url')
formats = self._extract_wowza_formats(
format_url, page_id, m3u8_entry_protocol, ['smil'])
self._sort_formats(formats)
return {
'id': page_id,
'title': self._live_title(page_id) if self._IS_LIVE else page_id,
'uploader_id': hidden_inputs.get('userid'),
'duration': int_or_none(hidden_inputs.get('duration')),
'is_live': self._IS_LIVE,
'formats': formats,
}
class MangomoloVideoIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:video'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/video\?.*?\bid=(?P<id>\d+)'
_IS_LIVE = False
class MangomoloLiveIE(MangomoloBaseIE):
IE_NAME = 'mangomolo:live'
_VALID_URL = r'https?://admin\.mangomolo\.com/analytics/index\.php/customers/embed/index\?.*?\bchannelid=(?P<id>(?:[A-Za-z0-9+/=]|%2B|%2F|%3D)+)'
_IS_LIVE = True
def _get_real_id(self, page_id):
return base64.b64decode(compat_urllib_parse_unquote(page_id).encode()).decode()

View File

@@ -9,7 +9,7 @@ from ..utils import (
class MetacriticIE(InfoExtractor): class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222', 'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',

View File

@@ -6,7 +6,7 @@ from ..utils import int_or_none
class MGTVIE(InfoExtractor): class MGTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html' _VALID_URL = r'https?://(?:www\.)?mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
IE_DESC = '芒果TV' IE_DESC = '芒果TV'
_TESTS = [{ _TESTS = [{

View File

@@ -8,7 +8,7 @@ from ..utils import (
class MinistryGridIE(InfoExtractor): class MinistryGridIE(InfoExtractor):
_VALID_URL = r'https?://www\.ministrygrid.com/([^/?#]*/)*(?P<id>[^/#?]+)/?(?:$|[?#])' _VALID_URL = r'https?://(?:www\.)?ministrygrid\.com/([^/?#]*/)*(?P<id>[^/#?]+)/?(?:$|[?#])'
_TEST = { _TEST = {
'url': 'http://www.ministrygrid.com/training-viewer/-/training/t4g-2014-conference/the-gospel-by-numbers-4/the-gospel-by-numbers', 'url': 'http://www.ministrygrid.com/training-viewer/-/training/t4g-2014-conference/the-gospel-by-numbers-4/the-gospel-by-numbers',

View File

@@ -74,7 +74,7 @@ class MiTeleBaseIE(InfoExtractor):
class MiTeleIE(MiTeleBaseIE): class MiTeleIE(MiTeleBaseIE):
IE_DESC = 'mitele.es' IE_DESC = 'mitele.es'
_VALID_URL = r'https?://www\.mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/' _VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/){3}(?P<id>[^/]+)/'
_TESTS = [{ _TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/', 'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',

View File

@@ -9,7 +9,7 @@ from ..compat import (
class MotorsportIE(InfoExtractor): class MotorsportIE(InfoExtractor):
IE_DESC = 'motorsport.com' IE_DESC = 'motorsport.com'
_VALID_URL = r'https?://www\.motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])' _VALID_URL = r'https?://(?:www\.)?motorsport\.com/[^/?#]+/video/(?:[^/?#]+/)(?P<id>[^/]+)/?(?:$|[?#])'
_TEST = { _TEST = {
'url': 'http://www.motorsport.com/f1/video/main-gallery/red-bull-racing-2014-rules-explained/', 'url': 'http://www.motorsport.com/f1/video/main-gallery/red-bull-racing-2014-rules-explained/',
'info_dict': { 'info_dict': {

View File

@@ -7,7 +7,7 @@ from .common import InfoExtractor
class MoviezineIE(InfoExtractor): class MoviezineIE(InfoExtractor):
_VALID_URL = r'https?://www\.moviezine\.se/video/(?P<id>[^?#]+)' _VALID_URL = r'https?://(?:www\.)?moviezine\.se/video/(?P<id>[^?#]+)'
_TEST = { _TEST = {
'url': 'http://www.moviezine.se/video/205866', 'url': 'http://www.moviezine.se/video/205866',

View File

@@ -11,7 +11,7 @@ from ..utils import (
class MySpassIE(InfoExtractor): class MySpassIE(InfoExtractor):
_VALID_URL = r'https?://www\.myspass\.de/.*' _VALID_URL = r'https?://(?:www\.)?myspass\.de/.*'
_TEST = { _TEST = {
'url': 'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/', 'url': 'http://www.myspass.de/myspass/shows/tvshows/absolute-mehrheit/Absolute-Mehrheit-vom-17022013-Die-Highlights-Teil-2--/11741/',
'md5': '0b49f4844a068f8b33f4b7c88405862b', 'md5': '0b49f4844a068f8b33f4b7c88405862b',

View File

@@ -13,7 +13,7 @@ from ..utils import (
class NBCIE(InfoExtractor): class NBCIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)' _VALID_URL = r'https?://(?:www\.)?nbc\.com/(?:[^/]+/)+(?P<id>n?\d+)'
_TESTS = [ _TESTS = [
{ {
@@ -138,7 +138,7 @@ class NBCSportsVPlayerIE(InfoExtractor):
class NBCSportsIE(InfoExtractor): class NBCSportsIE(InfoExtractor):
# Does not include https because its certificate is invalid # Does not include https because its certificate is invalid
_VALID_URL = r'https?://www\.nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)' _VALID_URL = r'https?://(?:www\.)?nbcsports\.com//?(?:[^/]+/)+(?P<id>[0-9a-z-]+)'
_TEST = { _TEST = {
'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke', 'url': 'http://www.nbcsports.com//college-basketball/ncaab/tom-izzo-michigan-st-has-so-much-respect-duke',
@@ -161,7 +161,7 @@ class NBCSportsIE(InfoExtractor):
class CSNNEIE(InfoExtractor): class CSNNEIE(InfoExtractor):
_VALID_URL = r'https?://www\.csnne\.com/video/(?P<id>[0-9a-z-]+)' _VALID_URL = r'https?://(?:www\.)?csnne\.com/video/(?P<id>[0-9a-z-]+)'
_TEST = { _TEST = {
'url': 'http://www.csnne.com/video/snc-evening-update-wright-named-red-sox-no-5-starter', 'url': 'http://www.csnne.com/video/snc-evening-update-wright-named-red-sox-no-5-starter',
@@ -335,3 +335,43 @@ class NBCNewsIE(ThePlatformIE):
'url': 'http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews?byId=%s' % video_id, 'url': 'http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews?byId=%s' % video_id,
'ie_key': 'ThePlatformFeed', 'ie_key': 'ThePlatformFeed',
} }
class NBCOlympicsIE(InfoExtractor):
_VALID_URL = r'https?://www\.nbcolympics\.com/video/(?P<id>[a-z-]+)'
_TEST = {
# Geo-restricted to US
'url': 'http://www.nbcolympics.com/video/justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'md5': '54fecf846d05429fbaa18af557ee523a',
'info_dict': {
'id': 'WjTBzDXx5AUq',
'display_id': 'justin-roses-son-leo-was-tears-after-his-dad-won-gold',
'ext': 'mp4',
'title': 'Rose\'s son Leo was in tears after his dad won gold',
'description': 'Olympic gold medalist Justin Rose gets emotional talking to the impact his win in men\'s golf has already had on his children.',
'timestamp': 1471274964,
'upload_date': '20160815',
'uploader': 'NBCU-SPORTS',
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
drupal_settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), display_id)
iframe_url = drupal_settings['vod']['iframe_url']
theplatform_url = iframe_url.replace(
'vplayer.nbcolympics.com', 'player.theplatform.com')
return {
'_type': 'url_transparent',
'url': theplatform_url,
'ie_key': ThePlatformIE.ie_key(),
'display_id': display_id,
}

View File

@@ -23,7 +23,7 @@ class NDRBaseIE(InfoExtractor):
class NDRIE(NDRBaseIE): class NDRIE(NDRBaseIE):
IE_NAME = 'ndr' IE_NAME = 'ndr'
IE_DESC = 'NDR.de - Norddeutscher Rundfunk' IE_DESC = 'NDR.de - Norddeutscher Rundfunk'
_VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html' _VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[^/?#]+),[\da-z]+\.html'
_TESTS = [{ _TESTS = [{
# httpVideo, same content id # httpVideo, same content id
'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html', 'url': 'http://www.ndr.de/fernsehen/Party-Poette-und-Parade,hafengeburtstag988.html',
@@ -105,7 +105,7 @@ class NDRIE(NDRBaseIE):
class NJoyIE(NDRBaseIE): class NJoyIE(NDRBaseIE):
IE_NAME = 'njoy' IE_NAME = 'njoy'
IE_DESC = 'N-JOY' IE_DESC = 'N-JOY'
_VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html' _VALID_URL = r'https?://(?:www\.)?n-joy\.de/(?:[^/]+/)*(?:(?P<display_id>[^/?#]+),)?(?P<id>[\da-z]+)\.html'
_TESTS = [{ _TESTS = [{
# httpVideo, same content id # httpVideo, same content id
'url': 'http://www.n-joy.de/entertainment/comedy/comedy_contest/Benaissa-beim-NDR-Comedy-Contest,comedycontest2480.html', 'url': 'http://www.n-joy.de/entertainment/comedy/comedy_contest/Benaissa-beim-NDR-Comedy-Contest,comedycontest2480.html',
@@ -238,7 +238,7 @@ class NDREmbedBaseIE(InfoExtractor):
class NDREmbedIE(NDREmbedBaseIE): class NDREmbedIE(NDREmbedBaseIE):
IE_NAME = 'ndr:embed' IE_NAME = 'ndr:embed'
_VALID_URL = r'https?://www\.ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html' _VALID_URL = r'https?://(?:www\.)?ndr\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ndr.de/fernsehen/sendungen/ndr_aktuell/ndraktuell28488-player.html', 'url': 'http://www.ndr.de/fernsehen/sendungen/ndr_aktuell/ndraktuell28488-player.html',
'md5': '8b9306142fe65bbdefb5ce24edb6b0a9', 'md5': '8b9306142fe65bbdefb5ce24edb6b0a9',
@@ -332,7 +332,7 @@ class NDREmbedIE(NDREmbedBaseIE):
class NJoyEmbedIE(NDREmbedBaseIE): class NJoyEmbedIE(NDREmbedBaseIE):
IE_NAME = 'njoy:embed' IE_NAME = 'njoy:embed'
_VALID_URL = r'https?://www\.n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html' _VALID_URL = r'https?://(?:www\.)?n-joy\.de/(?:[^/]+/)*(?P<id>[\da-z]+)-(?:player|externalPlayer)_[^/]+\.html'
_TESTS = [{ _TESTS = [{
# httpVideo # httpVideo
'url': 'http://www.n-joy.de/events/reeperbahnfestival/doku948-player_image-bc168e87-5263-4d6d-bd27-bb643005a6de_theme-n-joy.html', 'url': 'http://www.n-joy.de/events/reeperbahnfestival/doku948-player_image-bc168e87-5263-4d6d-bd27-bb643005a6de_theme-n-joy.html',

View File

@@ -1,15 +1,12 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
class NewgroundsIE(InfoExtractor): class NewgroundsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.newgrounds.com/audio/listen/549479', 'url': 'https://www.newgrounds.com/audio/listen/549479',
'md5': 'fe6033d297591288fa1c1f780386f07a', 'md5': 'fe6033d297591288fa1c1f780386f07a',
'info_dict': { 'info_dict': {
'id': '549479', 'id': '549479',
@@ -18,7 +15,7 @@ class NewgroundsIE(InfoExtractor):
'uploader': 'Burn7', 'uploader': 'Burn7',
} }
}, { }, {
'url': 'http://www.newgrounds.com/portal/view/673111', 'url': 'https://www.newgrounds.com/portal/view/673111',
'md5': '3394735822aab2478c31b1004fe5e5bc', 'md5': '3394735822aab2478c31b1004fe5e5bc',
'info_dict': { 'info_dict': {
'id': '673111', 'id': '673111',
@@ -29,24 +26,20 @@ class NewgroundsIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) media_id = self._match_id(url)
music_id = mobj.group('id') webpage = self._download_webpage(url, media_id)
webpage = self._download_webpage(url, music_id)
title = self._html_search_regex( title = self._html_search_regex(
r'<title>([^>]+)</title>', webpage, 'title') r'<title>([^>]+)</title>', webpage, 'title')
uploader = self._html_search_regex( uploader = self._html_search_regex(
[r',"artist":"([^"]+)",', r'[\'"]owner[\'"]\s*:\s*[\'"]([^\'"]+)[\'"],'], r'Author\s*<a[^>]+>([^<]+)', webpage, 'uploader', fatal=False)
webpage, 'uploader')
music_url_json_string = self._html_search_regex( music_url = self._parse_json(self._search_regex(
r'({"url":"[^"]+"),', webpage, 'music url') + '}' r'"url":("[^"]+"),', webpage, ''), media_id)
music_url_json = json.loads(music_url_json_string)
music_url = music_url_json['url']
return { return {
'id': music_id, 'id': media_id,
'title': title, 'title': title,
'url': music_url, 'url': music_url,
'uploader': uploader, 'uploader': uploader,

View File

@@ -7,7 +7,7 @@ from ..utils import parse_iso8601
class NextMediaIE(InfoExtractor): class NextMediaIE(InfoExtractor):
IE_DESC = '蘋果日報' IE_DESC = '蘋果日報'
_VALID_URL = r'https?://hk.apple.nextmedia.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)' _VALID_URL = r'https?://hk\.apple\.nextmedia\.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://hk.apple.nextmedia.com/realtime/news/20141108/53109199', 'url': 'http://hk.apple.nextmedia.com/realtime/news/20141108/53109199',
'md5': 'dff9fad7009311c421176d1ac90bfe4f', 'md5': 'dff9fad7009311c421176d1ac90bfe4f',
@@ -68,7 +68,7 @@ class NextMediaIE(InfoExtractor):
class NextMediaActionNewsIE(NextMediaIE): class NextMediaActionNewsIE(NextMediaIE):
IE_DESC = '蘋果日報 - 動新聞' IE_DESC = '蘋果日報 - 動新聞'
_VALID_URL = r'https?://hk.dv.nextmedia.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+' _VALID_URL = r'https?://hk\.dv\.nextmedia\.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
_TESTS = [{ _TESTS = [{
'url': 'http://hk.dv.nextmedia.com/actionnews/hit/20150121/19009428/20061460', 'url': 'http://hk.dv.nextmedia.com/actionnews/hit/20150121/19009428/20061460',
'md5': '05fce8ffeed7a5e00665d4b7cf0f9201', 'md5': '05fce8ffeed7a5e00665d4b7cf0f9201',
@@ -93,7 +93,7 @@ class NextMediaActionNewsIE(NextMediaIE):
class AppleDailyIE(NextMediaIE): class AppleDailyIE(NextMediaIE):
IE_DESC = '臺灣蘋果日報' IE_DESC = '臺灣蘋果日報'
_VALID_URL = r'https?://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?' _VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{ _TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694', 'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
'md5': 'a843ab23d150977cc55ef94f1e2c1e4d', 'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',

View File

@@ -165,7 +165,7 @@ class NFLIE(InfoExtractor):
group='config')) group='config'))
# For articles, the id in the url is not the video id # For articles, the id in the url is not the video id
video_id = self._search_regex( video_id = self._search_regex(
r'(?:<nflcs:avplayer[^>]+data-content[Ii]d\s*=\s*|content[Ii]d\s*:\s*)(["\'])(?P<id>.+?)\1', r'(?:<nflcs:avplayer[^>]+data-content[Ii]d\s*=\s*|content[Ii]d\s*:\s*)(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'video id', default=video_id, group='id') webpage, 'video id', default=video_id, group='id')
config = self._download_json(config_url, video_id, 'Downloading player config') config = self._download_json(config_url, video_id, 'Downloading player config')
url_template = NFLIE.prepend_host( url_template = NFLIE.prepend_host(

View File

@@ -1,14 +1,15 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError
class NhkVodIE(InfoExtractor): class NhkVodIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/vod/(?P<id>.+?)\.html' _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/vod/(?P<id>[^/]+/[^/?#&]+)'
_TEST = { _TEST = {
# Videos available only for a limited period of time. Visit # Videos available only for a limited period of time. Visit
# http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples. # http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples.
'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815.html', 'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815',
'info_dict': { 'info_dict': {
'id': 'A1bnNiNTE6nY3jLllS-BIISfcC_PpvF5', 'id': 'A1bnNiNTE6nY3jLllS-BIISfcC_PpvF5',
'ext': 'flv', 'ext': 'flv',
@@ -19,25 +20,25 @@ class NhkVodIE(InfoExtractor):
}, },
'skip': 'Videos available only for a limited period of time', 'skip': 'Videos available only for a limited period of time',
} }
_API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) data = self._download_json(self._API_URL, video_id)
embed_code = self._search_regex( try:
r'nw_vod_ooplayer\([^,]+,\s*(["\'])(?P<id>(?:(?!\1).)+)\1', episode = next(
webpage, 'ooyala embed code', group='id') e for e in data['data']['episodes']
if e.get('url') and video_id in e['url'])
except StopIteration:
raise ExtractorError('Unable to find episode')
title = self._search_regex( embed_code = episode['vod_id']
r'<div[^>]+class=["\']episode-detail["\']>\s*<h\d+>([^<]+)',
webpage, 'title', default=None) title = episode.get('sub_title_clean') or episode['sub_title']
description = self._html_search_regex( description = episode.get('description_clean') or episode.get('description')
r'(?s)<p[^>]+class=["\']description["\'][^>]*>(.+?)</p>', series = episode.get('title_clean') or episode.get('title')
webpage, 'description', default=None)
series = self._search_regex(
r'<h2[^>]+class=["\']detail-top-player-title[^>]+><a[^>]+>([^<]+)',
webpage, 'series', default=None)
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',

View File

@@ -252,7 +252,7 @@ class NiconicoIE(InfoExtractor):
class NiconicoPlaylistIE(InfoExtractor): class NiconicoPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://www\.nicovideo\.jp/mylist/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?nicovideo\.jp/mylist/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.nicovideo.jp/mylist/27411728', 'url': 'http://www.nicovideo.jp/mylist/27411728',

View File

@@ -44,7 +44,20 @@ class NineNowIE(InfoExtractor):
page_data = self._parse_json(self._search_regex( page_data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.*?});', webpage, r'window\.__data\s*=\s*({.*?});', webpage,
'page data'), display_id) 'page data'), display_id)
common_data = page_data.get('episode', {}).get('episode') or page_data.get('clip', {}).get('clip')
for kind in ('episode', 'clip'):
current_key = page_data.get(kind, {}).get(
'current%sKey' % kind.capitalize())
if not current_key:
continue
cache = page_data.get(kind, {}).get('%sCache' % kind, {})
if not cache:
continue
common_data = (cache.get(current_key) or list(cache.values())[0])[kind]
break
else:
raise ExtractorError('Unable to find video data')
video_data = common_data['video'] video_data = common_data['video']
if video_data.get('drm'): if video_data.get('drm'):

View File

@@ -429,7 +429,7 @@ class SchoolTVIE(InfoExtractor):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( video_id = self._search_regex(
r'data-mid=(["\'])(?P<id>.+?)\1', webpage, 'video_id', group='id') r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage, 'video_id', group='id')
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'ie_key': 'NPO', 'ie_key': 'NPO',

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class OktoberfestTVIE(InfoExtractor): class OktoberfestTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.oktoberfest-tv\.de/[^/]+/[^/]+/video/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?oktoberfest-tv\.de/[^/]+/[^/]+/video/(?P<id>[^/?#]+)'
_TEST = { _TEST = {
'url': 'http://www.oktoberfest-tv.de/de/kameras/video/hb-zelt', 'url': 'http://www.oktoberfest-tv.de/de/kameras/video/hb-zelt',

View File

@@ -13,7 +13,7 @@ from ..utils import (
class OpenloadIE(InfoExtractor): class OpenloadIE(InfoExtractor):
_VALID_URL = r'https://openload.(?:co|io)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)' _VALID_URL = r'https?://openload\.(?:co|io)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://openload.co/f/kUEfGclsU9o', 'url': 'https://openload.co/f/kUEfGclsU9o',
@@ -60,7 +60,7 @@ class OpenloadIE(InfoExtractor):
if j >= 33 and j <= 126: if j >= 33 and j <= 126:
j = ((j + 14) % 94) + 33 j = ((j + 14) % 94) + 33
if idx == len(enc_data) - 1: if idx == len(enc_data) - 1:
j += 1 j += 3
video_url_chars += compat_chr(j) video_url_chars += compat_chr(j)
video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars) video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)

View File

@@ -1,53 +1,40 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
class ParliamentLiveUKIE(InfoExtractor): class ParliamentLiveUKIE(InfoExtractor):
IE_NAME = 'parliamentlive.tv' IE_NAME = 'parliamentlive.tv'
IE_DESC = 'UK parliament videos' IE_DESC = 'UK parliament videos'
_VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?parliamentlive\.tv/Event/Index/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TEST = { _TEST = {
'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia', 'url': 'http://parliamentlive.tv/Event/Index/c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
'info_dict': { 'info_dict': {
'id': '15121', 'id': 'c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
'ext': 'asf', 'ext': 'mp4',
'title': 'hoc home affairs committee, 18 mar 2014.pm', 'title': 'Home Affairs Committee',
'description': 'md5:033b3acdf83304cd43946b2d5e5798d1', 'uploader_id': 'FFMPEG-01',
'timestamp': 1422696664,
'upload_date': '20150131',
}, },
'params': {
'skip_download': True, # Requires mplayer (mms)
}
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') webpage = self._download_webpage(
webpage = self._download_webpage(url, video_id) 'http://vodplayer.parliamentlive.tv/?mid=' + video_id, video_id)
widget_config = self._parse_json(self._search_regex(
asx_url = self._html_search_regex( r'kWidgetConfig\s*=\s*({.+});',
r'embed.*?src="([^"]+)" name="MediaPlayer"', webpage, webpage, 'kaltura widget config'), video_id)
'metadata URL') kaltura_url = 'kaltura:%s:%s' % (widget_config['wid'][1:], widget_config['entry_id'])
asx = self._download_xml(asx_url, video_id, 'Downloading ASX metadata') event_title = self._download_json(
video_url = asx.find('.//REF').attrib['HREF'] 'http://parliamentlive.tv/Event/GetShareVideo/' + video_id, video_id)['event']['title']
title = self._search_regex(
r'''(?x)player\.setClipDetails\(
(?:(?:[0-9]+|"[^"]+"),\s*){2}
"([^"]+",\s*"[^"]+)"
''',
webpage, 'title').replace('", "', ', ')
description = self._html_search_regex(
r'(?s)<span id="MainContentPlaceHolder_CaptionsBlock_WitnessInfo">(.*?)</span>',
webpage, 'description')
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
'ext': 'asf', 'title': event_title,
'url': video_url, 'description': '',
'title': title, 'url': kaltura_url,
'description': description, 'ie_key': 'Kaltura',
} }

View File

@@ -94,7 +94,7 @@ class PeriscopeIE(PeriscopeBaseIE):
class PeriscopeUserIE(PeriscopeBaseIE): class PeriscopeUserIE(PeriscopeBaseIE):
_VALID_URL = r'https?://www\.periscope\.tv/(?P<id>[^/]+)/?$' _VALID_URL = r'https?://(?:www\.)?periscope\.tv/(?P<id>[^/]+)/?$'
IE_DESC = 'Periscope user videos' IE_DESC = 'Periscope user videos'
IE_NAME = 'periscope:user' IE_NAME = 'periscope:user'

View File

@@ -14,7 +14,7 @@ from ..utils import (
class PlayvidIE(InfoExtractor): class PlayvidIE(InfoExtractor):
_VALID_URL = r'https?://www\.playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)' _VALID_URL = r'https?://(?:www\.)?playvid\.com/watch(\?v=|/)(?P<id>.+?)(?:#|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.playvid.com/watch/RnmBNgtrrJu', 'url': 'http://www.playvid.com/watch/RnmBNgtrrJu',
'md5': 'ffa2f6b2119af359f544388d8c01eb6c', 'md5': 'ffa2f6b2119af359f544388d8c01eb6c',

View File

@@ -1,14 +1,17 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_str, compat_str,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urlparse
) )
from ..utils import ( from ..utils import (
extract_attributes,
int_or_none, int_or_none,
strip_or_none, strip_or_none,
unified_timestamp, unified_timestamp,
@@ -97,3 +100,81 @@ class PolskieRadioIE(InfoExtractor):
description = strip_or_none(self._og_search_description(webpage)) description = strip_or_none(self._og_search_description(webpage))
return self.playlist_result(entries, playlist_id, title, description) return self.playlist_result(entries, playlist_id, title, description)
class PolskieRadioCategoryIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?polskieradio\.pl/\d+(?:,[^/]+)?/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.polskieradio.pl/7/5102,HISTORIA-ZYWA',
'info_dict': {
'id': '5102',
'title': 'HISTORIA ŻYWA',
},
'playlist_mincount': 38,
}, {
'url': 'http://www.polskieradio.pl/7/4807',
'info_dict': {
'id': '4807',
'title': 'Vademecum 1050. rocznicy Chrztu Polski'
},
'playlist_mincount': 5
}, {
'url': 'http://www.polskieradio.pl/7/129,Sygnaly-dnia?ref=source',
'only_matching': True
}, {
'url': 'http://www.polskieradio.pl/37,RedakcjaKatolicka/4143,Kierunek-Krakow',
'info_dict': {
'id': '4143',
'title': 'Kierunek Kraków',
},
'playlist_mincount': 61
}, {
'url': 'http://www.polskieradio.pl/10,czworka/214,muzyka',
'info_dict': {
'id': '214',
'title': 'Muzyka',
},
'playlist_mincount': 61
}, {
'url': 'http://www.polskieradio.pl/7,Jedynka/5102,HISTORIA-ZYWA',
'only_matching': True,
}, {
'url': 'http://www.polskieradio.pl/8,Dwojka/196,Publicystyka',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if PolskieRadioIE.suitable(url) else super(PolskieRadioCategoryIE, cls).suitable(url)
def _entries(self, url, page, category_id):
content = page
for page_num in itertools.count(2):
for a_entry, entry_id in re.findall(
r'(?s)<article[^>]+>.*?(<a[^>]+href=["\']/\d+/\d+/Artykul/(\d+)[^>]+>).*?</article>',
content):
entry = extract_attributes(a_entry)
href = entry.get('href')
if not href:
continue
yield self.url_result(
compat_urlparse.urljoin(url, href), PolskieRadioIE.ie_key(),
entry_id, entry.get('title'))
mobj = re.search(
r'<div[^>]+class=["\']next["\'][^>]*>\s*<a[^>]+href=(["\'])(?P<url>(?:(?!\1).)+)\1',
content)
if not mobj:
break
next_url = compat_urlparse.urljoin(url, mobj.group('url'))
content = self._download_webpage(
next_url, category_id, 'Downloading page %s' % page_num)
def _real_extract(self, url):
category_id = self._match_id(url)
webpage = self._download_webpage(url, category_id)
title = self._html_search_regex(
r'<title>([^<]+) - [^<]+ - [^<]+</title>',
webpage, 'title', fatal=False)
return self.playlist_result(
self._entries(url, webpage, category_id),
category_id, title)

View File

@@ -15,6 +15,7 @@ from ..compat import (
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
js_to_json,
orderedSet, orderedSet,
sanitized_Request, sanitized_Request,
str_to_int, str_to_int,
@@ -48,6 +49,8 @@ class PornHubIE(InfoExtractor):
'dislike_count': int, 'dislike_count': int,
'comment_count': int, 'comment_count': int,
'age_limit': 18, 'age_limit': 18,
'tags': list,
'categories': list,
}, },
}, { }, {
# non-ASCII title # non-ASCII title
@@ -63,6 +66,8 @@ class PornHubIE(InfoExtractor):
'dislike_count': int, 'dislike_count': int,
'comment_count': int, 'comment_count': int,
'age_limit': 18, 'age_limit': 18,
'tags': list,
'categories': list,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@@ -183,6 +188,15 @@ class PornHubIE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
page_params = self._parse_json(self._search_regex(
r'page_params\.zoneDetails\[([\'"])[^\'"]+\1\]\s*=\s*(?P<data>{[^}]+})',
webpage, 'page parameters', group='data', default='{}'),
video_id, transform_source=js_to_json, fatal=False)
tags = categories = None
if page_params:
tags = page_params.get('tags', '').split(',')
categories = page_params.get('categories', '').split(',')
return { return {
'id': video_id, 'id': video_id,
'uploader': video_uploader, 'uploader': video_uploader,
@@ -195,6 +209,8 @@ class PornHubIE(InfoExtractor):
'comment_count': comment_count, 'comment_count': comment_count,
'formats': formats, 'formats': formats,
'age_limit': 18, 'age_limit': 18,
'tags': tags,
'categories': categories,
} }

View File

@@ -18,7 +18,7 @@ from ..utils import (
class QQMusicIE(InfoExtractor): class QQMusicIE(InfoExtractor):
IE_NAME = 'qqmusic' IE_NAME = 'qqmusic'
IE_DESC = 'QQ音乐' IE_DESC = 'QQ音乐'
_VALID_URL = r'https?://y.qq.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)' _VALID_URL = r'https?://y\.qq\.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD', 'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD',
'md5': '9ce1c1c8445f561506d2e3cfb0255705', 'md5': '9ce1c1c8445f561506d2e3cfb0255705',
@@ -172,7 +172,7 @@ class QQPlaylistBaseIE(InfoExtractor):
class QQMusicSingerIE(QQPlaylistBaseIE): class QQMusicSingerIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:singer' IE_NAME = 'qqmusic:singer'
IE_DESC = 'QQ音乐 - 歌手' IE_DESC = 'QQ音乐 - 歌手'
_VALID_URL = r'https?://y.qq.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)' _VALID_URL = r'https?://y\.qq\.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
_TEST = { _TEST = {
'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2', 'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2',
'info_dict': { 'info_dict': {
@@ -217,7 +217,7 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
class QQMusicAlbumIE(QQPlaylistBaseIE): class QQMusicAlbumIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:album' IE_NAME = 'qqmusic:album'
IE_DESC = 'QQ音乐 - 专辑' IE_DESC = 'QQ音乐 - 专辑'
_VALID_URL = r'https?://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)' _VALID_URL = r'https?://y\.qq\.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1', 'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1',

View File

@@ -13,6 +13,7 @@ from ..utils import (
xpath_element, xpath_element,
ExtractorError, ExtractorError,
determine_protocol, determine_protocol,
unsmuggle_url,
) )
@@ -35,28 +36,51 @@ class RadioCanadaIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
app_code, video_id = re.match(self._VALID_URL, url).groups() app_code, video_id = re.match(self._VALID_URL, url).groups()
device_types = ['ipad', 'android'] metadata = self._download_xml(
'http://api.radio-canada.ca/metaMedia/v1/index.ashx',
video_id, note='Downloading metadata XML', query={
'appCode': app_code,
'idMedia': video_id,
})
def get_meta(name):
el = find_xpath_attr(metadata, './/Meta', 'name', name)
return el.text if el is not None else None
if get_meta('protectionType'):
raise ExtractorError('This video is DRM protected.', expected=True)
device_types = ['ipad']
if app_code != 'toutv': if app_code != 'toutv':
device_types.append('flash') device_types.append('flash')
if not smuggled_data:
device_types.append('android')
formats = [] formats = []
# TODO: extract f4m formats # TODO: extract f4m formats
# f4m formats can be extracted using flashhd device_type but they produce unplayable file # f4m formats can be extracted using flashhd device_type but they produce unplayable file
for device_type in device_types: for device_type in device_types:
v_data = self._download_xml( validation_url = 'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx'
'http://api.radio-canada.ca/validationMedia/v1/Validation.ashx', query = {
video_id, note='Downloading %s XML' % device_type, query={ 'appCode': app_code,
'appCode': app_code, 'idMedia': video_id,
'idMedia': video_id, 'connectionType': 'broadband',
'connectionType': 'broadband', 'multibitrate': 'true',
'multibitrate': 'true', 'deviceType': device_type,
'deviceType': device_type, }
if smuggled_data:
validation_url = 'https://services.radio-canada.ca/media/validation/v2/'
query.update(smuggled_data)
else:
query.update({
# paysJ391wsHjbOJwvCs26toz and bypasslock are used to bypass geo-restriction # paysJ391wsHjbOJwvCs26toz and bypasslock are used to bypass geo-restriction
'paysJ391wsHjbOJwvCs26toz': 'CA', 'paysJ391wsHjbOJwvCs26toz': 'CA',
'bypasslock': 'NZt5K62gRqfc', 'bypasslock': 'NZt5K62gRqfc',
}, fatal=False) })
v_data = self._download_xml(validation_url, video_id, note='Downloading %s XML' % device_type, query=query, fatal=False)
v_url = xpath_text(v_data, 'url') v_url = xpath_text(v_data, 'url')
if not v_url: if not v_url:
continue continue
@@ -101,17 +125,6 @@ class RadioCanadaIE(InfoExtractor):
f4m_id='hds', fatal=False)) f4m_id='hds', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
metadata = self._download_xml(
'http://api.radio-canada.ca/metaMedia/v1/index.ashx',
video_id, note='Downloading metadata XML', query={
'appCode': app_code,
'idMedia': video_id,
})
def get_meta(name):
el = find_xpath_attr(metadata, './/Meta', 'name', name)
return el.text if el is not None else None
return { return {
'id': video_id, 'id': video_id,
'title': get_meta('Title'), 'title': get_meta('Title'),

View File

@@ -5,7 +5,7 @@ from .internetvideoarchive import InternetVideoArchiveIE
class RottenTomatoesIE(InfoExtractor): class RottenTomatoesIE(InfoExtractor):
_VALID_URL = r'https?://www\.rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.rottentomatoes.com/m/toy_story_3/trailers/11028566/', 'url': 'http://www.rottentomatoes.com/m/toy_story_3/trailers/11028566/',

View File

@@ -7,7 +7,7 @@ from ..utils import unified_strdate, determine_ext
class RoxwelIE(InfoExtractor): class RoxwelIE(InfoExtractor):
_VALID_URL = r'https?://www\.roxwel\.com/player/(?P<filename>.+?)(\.|\?|$)' _VALID_URL = r'https?://(?:www\.)?roxwel\.com/player/(?P<filename>.+?)(\.|\?|$)'
_TEST = { _TEST = {
'url': 'http://www.roxwel.com/player/passionpittakeawalklive.html', 'url': 'http://www.roxwel.com/player/passionpittakeawalklive.html',

View File

@@ -64,7 +64,7 @@ def _decrypt_url(png):
class RTVEALaCartaIE(InfoExtractor): class RTVEALaCartaIE(InfoExtractor):
IE_NAME = 'rtve.es:alacarta' IE_NAME = 'rtve.es:alacarta'
IE_DESC = 'RTVE a la carta' IE_DESC = 'RTVE a la carta'
_VALID_URL = r'https?://www\.rtve\.es/(m/)?(alacarta/videos|filmoteca)/[^/]+/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?rtve\.es/(m/)?(alacarta/videos|filmoteca)/[^/]+/[^/]+/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/', 'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/',
@@ -184,7 +184,7 @@ class RTVEInfantilIE(InfoExtractor):
class RTVELiveIE(InfoExtractor): class RTVELiveIE(InfoExtractor):
IE_NAME = 'rtve.es:live' IE_NAME = 'rtve.es:live'
IE_DESC = 'RTVE.es live streams' IE_DESC = 'RTVE.es live streams'
_VALID_URL = r'https?://www\.rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)' _VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtve.es/directo/la-1/', 'url': 'http://www.rtve.es/directo/la-1/',
@@ -226,7 +226,7 @@ class RTVELiveIE(InfoExtractor):
class RTVETelevisionIE(InfoExtractor): class RTVETelevisionIE(InfoExtractor):
IE_NAME = 'rtve.es:television' IE_NAME = 'rtve.es:television'
_VALID_URL = r'https?://www\.rtve\.es/television/[^/]+/[^/]+/(?P<id>\d+).shtml' _VALID_URL = r'https?://(?:www\.)?rtve\.es/television/[^/]+/[^/]+/(?P<id>\d+).shtml'
_TEST = { _TEST = {
'url': 'http://www.rtve.es/television/20160628/revolucion-del-movil/1364141.shtml', 'url': 'http://www.rtve.es/television/20160628/revolucion-del-movil/1364141.shtml',

View File

@@ -103,13 +103,13 @@ class SafariIE(SafariBaseIE):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
reference_id = self._search_regex( reference_id = self._search_regex(
r'data-reference-id=(["\'])(?P<id>.+?)\1', r'data-reference-id=(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'kaltura reference id', group='id') webpage, 'kaltura reference id', group='id')
partner_id = self._search_regex( partner_id = self._search_regex(
r'data-partner-id=(["\'])(?P<id>.+?)\1', r'data-partner-id=(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'kaltura widget id', group='id') webpage, 'kaltura widget id', group='id')
ui_id = self._search_regex( ui_id = self._search_regex(
r'data-ui-id=(["\'])(?P<id>.+?)\1', r'data-ui-id=(["\'])(?P<id>(?:(?!\1).)+)\1',
webpage, 'kaltura uiconf id', group='id') webpage, 'kaltura uiconf id', group='id')
query = { query = {

View File

@@ -11,7 +11,7 @@ from ..utils import (
class ScreenJunkiesIE(InfoExtractor): class ScreenJunkiesIE(InfoExtractor):
_VALID_URL = r'https?://www.screenjunkies.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)' _VALID_URL = r'https?://(?:www\.)?screenjunkies\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915', 'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
'md5': '5c2b686bec3d43de42bde9ec047536b0', 'md5': '5c2b686bec3d43de42bde9ec047536b0',

View File

@@ -48,7 +48,7 @@ class SenateISVPIE(InfoExtractor):
['arch', '', 'http://ussenate-f.akamaihd.net/'] ['arch', '', 'http://ussenate-f.akamaihd.net/']
] ]
_IE_NAME = 'senate.gov' _IE_NAME = 'senate.gov'
_VALID_URL = r'https?://www\.senate\.gov/isvp/?\?(?P<qs>.+)' _VALID_URL = r'https?://(?:www\.)?senate\.gov/isvp/?\?(?P<qs>.+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.senate.gov/isvp/?comm=judiciary&type=live&stt=&filename=judiciary031715&auto_play=false&wmode=transparent&poster=http%3A%2F%2Fwww.judiciary.senate.gov%2Fthemes%2Fjudiciary%2Fimages%2Fvideo-poster-flash-fit.png', 'url': 'http://www.senate.gov/isvp/?comm=judiciary&type=live&stt=&filename=judiciary031715&auto_play=false&wmode=transparent&poster=http%3A%2F%2Fwww.judiciary.senate.gov%2Fthemes%2Fjudiciary%2Fimages%2Fvideo-poster-flash-fit.png',
'info_dict': { 'info_dict': {

View File

@@ -14,7 +14,7 @@ from ..utils import (
class SlideshareIE(InfoExtractor): class SlideshareIE(InfoExtractor):
_VALID_URL = r'https?://www\.slideshare\.net/[^/]+?/(?P<title>.+?)($|\?)' _VALID_URL = r'https?://(?:www\.)?slideshare\.net/[^/]+?/(?P<title>.+?)($|\?)'
_TEST = { _TEST = {
'url': 'http://www.slideshare.net/Dataversity/keynote-presentation-managing-scale-and-complexity', 'url': 'http://www.slideshare.net/Dataversity/keynote-presentation-managing-scale-and-complexity',

View File

@@ -103,7 +103,7 @@ class SpiegelIE(InfoExtractor):
class SpiegelArticleIE(InfoExtractor): class SpiegelArticleIE(InfoExtractor):
_VALID_URL = r'https?://www\.spiegel\.de/(?!video/)[^?#]*?-(?P<id>[0-9]+)\.html' _VALID_URL = r'https?://(?:www\.)?spiegel\.de/(?!video/)[^?#]*?-(?P<id>[0-9]+)\.html'
IE_NAME = 'Spiegel:Article' IE_NAME = 'Spiegel:Article'
IE_DESC = 'Articles on spiegel.de' IE_DESC = 'Articles on spiegel.de'
_TESTS = [{ _TESTS = [{

View File

@@ -16,7 +16,7 @@ class SVTBaseIE(InfoExtractor):
def _extract_video(self, video_info, video_id): def _extract_video(self, video_info, video_id):
formats = [] formats = []
for vr in video_info['videoReferences']: for vr in video_info['videoReferences']:
player_type = vr.get('playerType') player_type = vr.get('playerType') or vr.get('format')
vurl = vr['url'] vurl = vr['url']
ext = determine_ext(vurl) ext = determine_ext(vurl)
if ext == 'm3u8': if ext == 'm3u8':

View File

@@ -8,7 +8,7 @@ from ..utils import (
class SyfyIE(AdobePassIE): class SyfyIE(AdobePassIE):
_VALID_URL = r'https?://www\.syfy\.com/(?:[^/]+/)?videos/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?syfy\.com/(?:[^/]+/)?videos/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.syfy.com/theinternetruinedmylife/videos/the-internet-ruined-my-life-season-1-trailer', 'url': 'http://www.syfy.com/theinternetruinedmylife/videos/the-internet-ruined-my-life-season-1-trailer',
'info_dict': { 'info_dict': {

View File

@@ -7,7 +7,7 @@ from .ooyala import OoyalaIE
class TeachingChannelIE(InfoExtractor): class TeachingChannelIE(InfoExtractor):
_VALID_URL = r'https?://www\.teachingchannel\.org/videos/(?P<title>.+)' _VALID_URL = r'https?://(?:www\.)?teachingchannel\.org/videos/(?P<title>.+)'
_TEST = { _TEST = {
'url': 'https://www.teachingchannel.org/videos/teacher-teaming-evolution', 'url': 'https://www.teachingchannel.org/videos/teacher-teaming-evolution',

View File

@@ -6,7 +6,7 @@ from .mitele import MiTeleBaseIE
class TelecincoIE(MiTeleBaseIE): class TelecincoIE(MiTeleBaseIE):
IE_DESC = 'telecinco.es, cuatro.com and mediaset.es' IE_DESC = 'telecinco.es, cuatro.com and mediaset.es'
_VALID_URL = r'https?://www\.(?:telecinco\.es|cuatro\.com|mediaset\.es)/(?:[^/]+/)+(?P<id>.+?)\.html' _VALID_URL = r'https?://(?:www\.)?(?:telecinco\.es|cuatro\.com|mediaset\.es)/(?:[^/]+/)+(?P<id>.+?)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.telecinco.es/robinfood/temporada-01/t01xp14/Bacalao-cocochas-pil-pil_0_1876350223.html', 'url': 'http://www.telecinco.es/robinfood/temporada-01/t01xp14/Bacalao-cocochas-pil-pil_0_1876350223.html',

View File

@@ -0,0 +1,36 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class TeleQuebecIE(InfoExtractor):
_VALID_URL = r'https?://zonevideo\.telequebec\.tv/media/(?P<id>\d+)'
_TEST = {
'url': 'http://zonevideo.telequebec.tv/media/20984/le-couronnement-de-new-york/couronnement-de-new-york',
'md5': 'fe95a0957e5707b1b01f5013e725c90f',
'info_dict': {
'id': '20984',
'ext': 'mp4',
'title': 'Le couronnement de New York',
'description': 'md5:f5b3d27a689ec6c1486132b2d687d432',
'upload_date': '20160220',
'timestamp': 1455965438,
}
}
def _real_extract(self, url):
media_id = self._match_id(url)
media_data = self._download_json(
'https://mnmedias.api.telequebec.tv/api/v2/media/' + media_id,
media_id)['media']
return {
'_type': 'url_transparent',
'id': media_id,
'url': 'limelight:media:' + media_data['streamInfo']['sourceId'],
'title': media_data['title'],
'description': media_data.get('descriptions', [{'text': None}])[0].get('text'),
'duration': int_or_none(media_data.get('durationInMilliseconds'), 1000),
'ie_key': 'LimelightMedia',
}

View File

@@ -5,7 +5,7 @@ from .common import InfoExtractor
class TelewebionIE(InfoExtractor): class TelewebionIE(InfoExtractor):
_VALID_URL = r'https?://www\.telewebion\.com/#!/episode/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?telewebion\.com/#!/episode/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.telewebion.com/#!/episode/1263668/', 'url': 'http://www.telewebion.com/#!/episode/1263668/',

View File

@@ -0,0 +1,53 @@
# coding: utf-8
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
HEADRequest,
ExtractorError,
int_or_none,
)
class TFOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = {
'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
'md5': '47c987d0515561114cf03d1226a9d4c7',
'info_dict': {
'id': '100463871',
'ext': 'mp4',
'title': 'Video Game Hackathon',
'description': 'md5:558afeba217c6c8d96c60e5421795c07',
'upload_date': '20160212',
'timestamp': 1455310233,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
self._request_webpage(HEADRequest('http://www.tfo.org/'), video_id)
infos = self._download_json(
'http://www.tfo.org/api/web/video/get_infos', video_id, data=json.dumps({
'product_id': video_id,
}).encode(), headers={
'X-tfo-session': self._get_cookies('http://www.tfo.org/')['tfo-session'].value,
})
if infos.get('success') == 0:
raise ExtractorError('%s said: %s' % (self.IE_NAME, infos['msg']), expected=True)
video_data = infos['data']
return {
'_type': 'url_transparent',
'id': video_id,
'url': 'limelight:media:' + video_data['llid'],
'title': video_data['title'],
'description': video_data.get('description'),
'series': video_data.get('collection'),
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'duration': int_or_none(video_data.get('duration')),
'ie_key': 'LimelightMedia',
}

View File

@@ -11,7 +11,7 @@ from ..utils import (
class TheInterceptIE(InfoExtractor): class TheInterceptIE(InfoExtractor):
_VALID_URL = r'https://theintercept.com/fieldofvision/(?P<id>[^/?#]+)' _VALID_URL = r'https?://theintercept\.com/fieldofvision/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://theintercept.com/fieldofvision/thisisacoup-episode-four-surrender-or-die/', 'url': 'https://theintercept.com/fieldofvision/thisisacoup-episode-four-surrender-or-die/',
'md5': '145f28b41d44aab2f87c0a4ac8ec95bd', 'md5': '145f28b41d44aab2f87c0a4ac8ec95bd',

View File

@@ -7,7 +7,7 @@ from ..utils import qualities
class TheSceneIE(InfoExtractor): class TheSceneIE(InfoExtractor):
_VALID_URL = r'https://thescene\.com/watch/[^/]+/(?P<id>[^/#?]+)' _VALID_URL = r'https?://thescene\.com/watch/[^/]+/(?P<id>[^/#?]+)'
_TEST = { _TEST = {
'url': 'https://thescene.com/watch/vogue/narciso-rodriguez-spring-2013-ready-to-wear', 'url': 'https://thescene.com/watch/vogue/narciso-rodriguez-spring-2013-ready-to-wear',

Some files were not shown because too many files have changed in this diff Show More