Compare commits

...

53 Commits

Author SHA1 Message Date
Sergey M․
f031b76065 release 2017.08.27 2017-08-27 04:28:04 +07:00
Sergey M․
62c06c593d [ChangeLog] Actualize 2017-08-27 04:27:19 +07:00
Sergey M․
ff17be3ac9 [extractor/generic] Extract from LD-JSON last of all
Previous sources may contain several formats, e.g. http://tamasha.com/v/PgGZ
2017-08-27 03:31:40 +07:00
Sergey M․
1ed4549942 [extractor/common] Extract format id from label attribute of source tag for HTML5 videos (#14034) 2017-08-27 03:27:05 +07:00
Sergey M․
dd121cc1ca [extractor/common] Extract height from res attribute of source tag for HTML5 videos (closes #14034) 2017-08-27 03:12:56 +07:00
Sergey M․
a3c3a1e128 [http] Rework HTTP downloader
* Simplify code and split into separate routines to facilitate maintaining
* Make retry mechanism work on errors during actual download not only during connection establishment phase
* Retry on ECONNRESET and ETIMEDOUT during reading data from network
* Retry on content too short and various timeout errors
* Show error description on retry
* Closes #506, closes #809, closes #2849, closes #4240, closes #6023, closes #8625, closes #9483
2017-08-27 02:22:30 +07:00
Sergey M․
085d9dd9be [rai] Fix audio formats extraction (closes #14024) 2017-08-26 22:02:49 +07:00
Vijay Singh
151978f38a [mixcloud] Fix extraction (closes #14020) 2017-08-26 19:32:57 +07:00
Sergey M․
c7121fa7b8 [youtube] Fix controversy videos extraction (closes #14027, closes #14029) 2017-08-26 15:38:38 +07:00
Vijay Singh
745968bc72 [mixcloud] Fix extraction (closes #14015) 2017-08-24 22:28:44 +07:00
Sergey M․
df235dbba8 release 2017.08.23 2017-08-23 23:23:13 +07:00
Sergey M․
c4bdc68113 [ChangeLog] Actualize 2017-08-23 23:21:19 +07:00
Sergey M․
5bae33485c [toutv] PEP 8 2017-08-23 22:50:00 +07:00
Sergey M․
0830f3e048 [cbc:watch] Bypass geo-restriction (closes #13993) 2017-08-23 22:45:45 +07:00
Sergey M․
8d7a24aff6 [toutv] Relax DRM check (closes #13994) 2017-08-23 22:28:09 +07:00
Sergey M․
37d9af306a [googledrive] Simplify and carry long lines (#13638) 2017-08-23 00:33:53 +07:00
Sergey M․
e01c3d2ef7 [extractor/common] Introduce _parse_xml 2017-08-23 00:32:41 +07:00
Parmjit Virk
05915e379a [googledrive] Add support for subtitles (fixes #13619) 2017-08-22 23:48:59 +07:00
Yen Chi Hsuan
7b67b60773 Merge pull request #13669 from bmwiedemann/master
[build] Override timestamps in zip file
2017-08-22 21:51:20 +08:00
Sergey M․
8d9c2a681a [pornhub] Relax uploader regex (closes #13906, closes #13975) 2017-08-21 23:06:27 +07:00
Alan Yee
903d4d1625 [README.md] Switch to HTTPS URLs 2017-08-20 23:35:39 +07:00
Luca Steeb
8239c6791a [bandcamp:album] Extract track titles 2017-08-20 23:32:33 +07:00
Sergey M․
b359e977b9 [extractor/common] Make HLS and DASH extraction non fatal in _parse_html5_media_entries (closes #13970) 2017-08-20 14:16:58 +07:00
Bernhard M. Wiedemann
305d99f0bd [build] Override timestamps in zip file
to make build reproducible.
See https://reproducible-builds.org/ for why this is good

Copying files to not interfere with freshness detection.
2017-08-19 21:43:48 +02:00
Sergey M․
d3d45e0a45 [bbccouk] Add support for events URLs (closes #13893) 2017-08-19 23:54:15 +07:00
Yen Chi Hsuan
381ad4f309 [liveleak] Support multi-video pages (closes #6542) 2017-08-19 22:48:00 +08:00
Yen Chi Hsuan
e2481b9b6e [ChangeLog] Fix 2017-08-19 22:28:58 +08:00
Yen Chi Hsuan
09747ba766 [liveleak] Support another liveleak embedding pattern (closes #13336) 2017-08-19 22:28:13 +08:00
Yen Chi Hsuan
f8f18f332f [cda] Fix extraction (closes #13935) 2017-08-19 21:44:47 +08:00
Yen Chi Hsuan
95f3f7c20a [utils] Fix unescapeHTML for misformed string like "&a"" (#13935) 2017-08-19 21:40:53 +08:00
Sergey M․
f5469da9e6 [laola1tv] Add support for tv.ittf.com (closes #13965) 2017-08-19 19:48:20 +07:00
Sergey M․
d14d9d8903 [mixcloud] Fix extraction (closes #13958) 2017-08-18 23:31:42 +07:00
Sergey M․
ea004d34f8 release 2017.08.18 2017-08-18 01:05:27 +07:00
Sergey M․
2738965d98 [ChangeLog] Actualize 2017-08-18 01:03:20 +07:00
Sergey M․
4a91910365 [qqmusic:toplist] PEP 8 2017-08-18 01:00:07 +07:00
Sergey M․
c0892b2b46 [arte] Detect unavailable videos (closes #13945) 2017-08-18 00:58:23 +07:00
Sergey M․
a5ac0c4755 [YoutubeDL] Sanitize byte string format URLs (#13951) 2017-08-17 23:59:12 +07:00
Sergey M․
5551d7714d [generic] Convert redirect URLs to unicode strings (closes #13951) 2017-08-17 23:58:01 +07:00
Sergey M․
5f5c7b92dd [udemy] Fix paid course detection (#13943) 2017-08-17 23:14:46 +07:00
Sergey M․
93d0583e34 [pluralsight] Use RPC API for course extraction (closes #13937) 2017-08-17 22:45:40 +07:00
Yen Chi Hsuan
5d28169747 Credit Genki Sky for clippit (bfabd17b33) 2017-08-17 21:21:17 +08:00
Yen Chi Hsuan
7ddab7742c [ChangeLog] Add an entry for Genki Sky's patch 2017-08-17 16:56:37 +08:00
Genki Sky
bfabd17b33 Add new extractor 2017-08-17 16:56:06 +08:00
Yen Chi Hsuan
12f5304556 [ChangeLog] Add entry for #13805 2017-08-17 16:40:56 +08:00
Yen Chi Hsuan
25a6e769a1 [qqmusic] Fix tests and cleanup 2017-08-17 16:39:57 +08:00
Yen Chi Hsuan
d22b67f356 Merge pull request #13805 from gam2046/master
Fix QQ Music url changed
2017-08-17 16:11:35 +08:00
Sergey M․
a1aa659662 [periscope] Renew HLS extraction (closes #13917) 2017-08-16 23:03:42 +07:00
Sergey M․
4850478543 [extractor/common] Add support for float durations in _parse_mpd_formats (closes #13919) 2017-08-15 23:58:00 +07:00
forDream
134d85a7bd [qqmusic] review 2017-08-15 13:14:35 +08:00
forDream
5c037c0d1f [qqmusic]support QQMusicSingerIE 2017-08-15 13:14:35 +08:00
forDream
5d1bd3b907 [qqmusic]update valid url 2017-08-15 13:14:34 +08:00
forDream
19ada898dc fix QQ Music Url changed 2017-08-15 13:14:34 +08:00
Sergey M․
da20951a57 [mixcloud] Extract decrypt key 2017-08-14 22:39:05 +07:00
36 changed files with 967 additions and 369 deletions

View File

@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.08.13*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.08.13**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.08.27*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.08.27**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.08.13
[debug] youtube-dl version 2017.08.27
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -223,3 +223,4 @@ Jan Kundrát
Giuseppe Fabiano
Örn Guðjónsson
Parmjit Virk
Genki Sky

View File

@@ -3,7 +3,7 @@
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Command-line args: [u'-v', u'https://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.12.06
[debug] Git HEAD: 135392e
@@ -34,7 +34,7 @@ For bug reports, this means that your report should contain the *complete* outpu
If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `http://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `http://www.youtube.com/`) is *not* an example URL.
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `https://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `https://www.youtube.com/`) is *not* an example URL.
### Are you using the latest version?
@@ -70,7 +70,7 @@ It may sound strange, but some bug reports we receive are completely unrelated t
# DEVELOPER INSTRUCTIONS
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
Most users do not need to build youtube-dl and can [download the builds](https://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
To run youtube-dl as a developer, you don't need to build anything either. Simply execute
@@ -118,7 +118,7 @@ After you have ensured this site is distributing its content legally, you can fo
class YourExtractorIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://yourextractor.com/watch/42',
'url': 'https://yourextractor.com/watch/42',
'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
'info_dict': {
'id': '42',
@@ -151,8 +151,8 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py

View File

@@ -1,3 +1,63 @@
version 2017.08.27
Core
+ [extractor/common] Extract height and format id for HTML5 videos (#14034)
* [downloader/http] Rework HTTP downloader (#506, #809, #2849, #4240, #6023,
#8625, #9483)
* Simplify code and split into separate routines to facilitate maintaining
* Make retry mechanism work on errors during actual download not only
during connection establishment phase
* Retry on ECONNRESET and ETIMEDOUT during reading data from network
* Retry on content too short
* Show error description on retry
Extractors
* [generic] Lower preference for extraction from LD-JSON
* [rai] Fix audio formats extraction (#14024)
* [youtube] Fix controversy videos extraction (#14027, #14029)
* [mixcloud] Fix extraction (#14015, #14020)
version 2017.08.23
Core
+ [extractor/common] Introduce _parse_xml
* [extractor/common] Make HLS and DASH extraction in_parse_html5_media_entries
non fatal (#13970)
* [utils] Fix unescapeHTML for misformed string like "&a&quot;" (#13935)
Extractors
* [cbc:watch] Bypass geo restriction (#13993)
* [toutv] Relax DRM check (#13994)
+ [googledrive] Add support for subtitles (#13619, #13638)
* [pornhub] Relax uploader regular expression (#13906, #13975)
* [bandcamp:album] Extract track titles (#13962)
+ [bbccouk] Add support for events URLs (#13893)
+ [liveleak] Support multi-video pages (#6542)
+ [liveleak] Support another liveleak embedding pattern (#13336)
* [cda] Fix extraction (#13935)
+ [laola1tv] Add support for tv.ittf.com (#13965)
* [mixcloud] Fix extraction (#13958, #13974, #13980, #14003)
version 2017.08.18
Core
* [YoutubeDL] Sanitize byte string format URLs (#13951)
+ [extractor/common] Add support for float durations in _parse_mpd_formats
(#13919)
Extractors
* [arte] Detect unavailable videos (#13945)
* [generic] Convert redirect URLs to unicode strings (#13951)
* [udemy] Fix paid course detection (#13943)
* [pluralsight] Use RPC API for course extraction (#13937)
+ [clippit] Add support for clippituser.tv
+ [qqmusic] Support new URL schemes (#13805)
* [periscope] Renew HLS extraction (#13917)
* [mixcloud] Extract decrypt key
version 2017.08.13
Core
@@ -100,7 +160,7 @@ Extractors
* [youku:show] Fix playlist extraction (#13248)
+ [dispeak] Recognize sevt subdomain (#13276)
* [adn] Improve error reporting (#13663)
* [crunchyroll] Relax series and season regex (#13659)
* [crunchyroll] Relax series and season regular expression (#13659)
+ [spiegel:article] Add support for nexx iframe embeds (#13029)
+ [nexx:embed] Add support for iframe embeds
* [nexx] Improve JS embed extraction

View File

@@ -46,8 +46,15 @@ tar: youtube-dl.tar.gz
pypi-files: youtube-dl.bash-completion README.txt youtube-dl.1 youtube-dl.fish
youtube-dl: youtube_dl/*.py youtube_dl/*/*.py
zip --quiet youtube-dl youtube_dl/*.py youtube_dl/*/*.py
zip --quiet --junk-paths youtube-dl youtube_dl/__main__.py
mkdir -p zip
for d in youtube_dl youtube_dl/downloader youtube_dl/extractor youtube_dl/postprocessor ; do \
mkdir -p zip/$$d ;\
cp -a $$d/*.py zip/$$d/ ;\
done
touch -t 200001010101 zip/youtube_dl/*.py zip/youtube_dl/*/*.py
mv zip/youtube_dl/__main__.py zip/
cd zip ; zip --quiet ../youtube-dl youtube_dl/*.py youtube_dl/*/*.py __main__.py
rm -rf zip
echo '#!$(PYTHON)' > youtube-dl
cat youtube-dl.zip >> youtube-dl
rm youtube-dl.zip

View File

@@ -25,7 +25,7 @@ If you do not have curl, you can alternatively use a recent wget:
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](https://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
You can also use pip:
@@ -33,7 +33,7 @@ You can also use pip:
This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
OS X users can install youtube-dl with [Homebrew](http://brew.sh/):
OS X users can install youtube-dl with [Homebrew](https://brew.sh/):
brew install youtube-dl
@@ -458,7 +458,7 @@ You can also use `--config-location` if you want to use custom configuration fil
### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](https://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
```
touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc
@@ -485,7 +485,7 @@ The `-o` option allows users to indicate a template for the output file names.
**tl;dr:** [navigate me to examples](#output-template-examples).
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a formatting operations. Allowed names along with sequence type are:
The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by a formatting operations. Allowed names along with sequence type are:
- `id` (string): Video identifier
- `title` (string): Video title
@@ -603,7 +603,7 @@ $ youtube-dl -o '%(uploader)s/%(playlist)s/%(playlist_index)s - %(title)s.%(ext)
$ youtube-dl -u user -p password -o '~/MyVideos/%(playlist)s/%(chapter_number)s - %(chapter)s/%(title)s.%(ext)s' https://www.udemy.com/java-tutorial/
# Download entire series season keeping each series and each season in separate directory under C:/MyVideos
$ youtube-dl -o "C:/MyVideos/%(series)s/%(season_number)s - %(season)s/%(episode_number)s - %(episode)s.%(ext)s" http://videomore.ru/kino_v_detalayah/5_sezon/367617
$ youtube-dl -o "C:/MyVideos/%(series)s/%(season_number)s - %(season)s/%(episode_number)s - %(episode)s.%(ext)s" https://videomore.ru/kino_v_detalayah/5_sezon/367617
# Stream the video being downloaded to stdout
$ youtube-dl -o - BaW_jenozKc
@@ -716,17 +716,17 @@ $ youtube-dl --dateafter 20000101 --datebefore 20091231
### How do I update youtube-dl?
If you've followed [our manual installation instructions](http://rg3.github.io/youtube-dl/download.html), you can simply run `youtube-dl -U` (or, on Linux, `sudo youtube-dl -U`).
If you've followed [our manual installation instructions](https://rg3.github.io/youtube-dl/download.html), you can simply run `youtube-dl -U` (or, on Linux, `sudo youtube-dl -U`).
If you have used pip, a simple `sudo pip install -U youtube-dl` is sufficient to update.
If you have installed youtube-dl using a package manager like *apt-get* or *yum*, use the standard system update mechanism to update. Note that distribution packages are often outdated. As a rule of thumb, youtube-dl releases at least once a month, and often weekly or even daily. Simply go to http://yt-dl.org/ to find out the current version. Unfortunately, there is nothing we youtube-dl developers can do if your distribution serves a really outdated version. You can (and should) complain to your distribution in their bugtracker or support forum.
If you have installed youtube-dl using a package manager like *apt-get* or *yum*, use the standard system update mechanism to update. Note that distribution packages are often outdated. As a rule of thumb, youtube-dl releases at least once a month, and often weekly or even daily. Simply go to https://yt-dl.org to find out the current version. Unfortunately, there is nothing we youtube-dl developers can do if your distribution serves a really outdated version. You can (and should) complain to your distribution in their bugtracker or support forum.
As a last resort, you can also uninstall the version installed by your package manager and follow our manual installation instructions. For that, remove the distribution's package, with a line like
sudo apt-get remove -y youtube-dl
Afterwards, simply follow [our manual installation instructions](http://rg3.github.io/youtube-dl/download.html):
Afterwards, simply follow [our manual installation instructions](https://rg3.github.io/youtube-dl/download.html):
```
sudo wget https://yt-dl.org/latest/youtube-dl -O /usr/local/bin/youtube-dl
@@ -766,11 +766,11 @@ Apparently YouTube requires you to pass a CAPTCHA test if you download too much.
youtube-dl works fine on its own on most sites. However, if you want to convert video/audio, you'll need [avconv](https://libav.org/) or [ffmpeg](https://www.ffmpeg.org/). On some sites - most notably YouTube - videos can be retrieved in a higher quality format without sound. youtube-dl will detect whether avconv/ffmpeg is present and automatically pick the best option.
Videos or video formats streamed via RTMP protocol can only be downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. Downloading MMS and RTSP videos requires either [mplayer](http://mplayerhq.hu/) or [mpv](https://mpv.io/) to be installed.
Videos or video formats streamed via RTMP protocol can only be downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. Downloading MMS and RTSP videos requires either [mplayer](https://mplayerhq.hu/) or [mpv](https://mpv.io/) to be installed.
### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org/) or [mplayer](http://www.mplayerhq.hu/).
Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](https://www.videolan.org/) or [mplayer](https://www.mplayerhq.hu/).
### I extracted a video URL with `-g`, but it does not play on another machine / in my web browser.
@@ -845,10 +845,10 @@ Use the `-o` to specify an [output template](#output-template), for example `-o
### How do I download a video starting with a `-`?
Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the options with `--`:
Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the options with `--`:
youtube-dl -- -wNyEUrxzFU
youtube-dl "http://www.youtube.com/watch?v=-wNyEUrxzFU"
youtube-dl "https://www.youtube.com/watch?v=-wNyEUrxzFU"
### How do I pass cookies to youtube-dl?
@@ -862,9 +862,9 @@ Passing cookies to youtube-dl is a good way to workaround login when a particula
### How do I stream directly to media player?
You will first need to tell youtube-dl to stream media to stdout with `-o -`, and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to [vlc](http://www.videolan.org/) can be achieved with:
You will first need to tell youtube-dl to stream media to stdout with `-o -`, and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to [vlc](https://www.videolan.org/) can be achieved with:
youtube-dl -o - "http://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
youtube-dl -o - "https://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
### How do I download only new videos from a playlist?
@@ -884,7 +884,7 @@ When youtube-dl detects an HLS video, it can download it either with the built-i
When youtube-dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube-dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube-dl, with improvements of the built-in downloader and/or ffmpeg.
In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](http://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](https://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
If you put either `--hls-prefer-native` or `--hls-prefer-ffmpeg` into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to [file an issue](https://yt-dl.org/bug) or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case.
@@ -910,7 +910,7 @@ Feel free to bump the issue from time to time by writing a small comment ("Issue
### How can I detect whether a given URL is supported by youtube-dl?
For one, have a look at the [list of supported sites](docs/supportedsites.md). Note that it can sometimes happen that the site changes its URL scheme (say, from http://example.com/video/1234567 to http://example.com/v/1234567 ) and youtube-dl reports an URL of a service in that list as unsupported. In that case, simply report a bug.
For one, have a look at the [list of supported sites](docs/supportedsites.md). Note that it can sometimes happen that the site changes its URL scheme (say, from https://example.com/video/1234567 to https://example.com/v/1234567 ) and youtube-dl reports an URL of a service in that list as unsupported. In that case, simply report a bug.
It is *not* possible to detect whether a URL is supported or not. That's because youtube-dl contains a generic extractor which matches **all** URLs. You may be tempted to disable, exclude, or remove the generic extractor, but the generic extractor not only allows users to extract videos from lots of websites that embed a video from another service, but may also be used to extract video from a service that it's hosting itself. Therefore, we neither recommend nor support disabling, excluding, or removing the generic extractor.
@@ -924,7 +924,7 @@ youtube-dl is an open-source project manned by too few volunteers, so we'd rathe
# DEVELOPER INSTRUCTIONS
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
Most users do not need to build youtube-dl and can [download the builds](https://rg3.github.io/youtube-dl/download.html) or get them from their distribution.
To run youtube-dl as a developer, you don't need to build anything either. Simply execute
@@ -972,7 +972,7 @@ After you have ensured this site is distributing its content legally, you can fo
class YourExtractorIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://yourextractor.com/watch/42',
'url': 'https://yourextractor.com/watch/42',
'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
'info_dict': {
'id': '42',
@@ -1005,8 +1005,8 @@ After you have ensured this site is distributing its content legally, you can fo
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
@@ -1162,7 +1162,7 @@ import youtube_dl
ydl_opts = {}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc'])
```
Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L129-L279). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
@@ -1201,19 +1201,19 @@ ydl_opts = {
'progress_hooks': [my_hook],
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc'])
```
# BUGS
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](http://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
Bugs and suggestions should be reported at: <https://github.com/rg3/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](https://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Command-line args: [u'-v', u'https://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.12.06
[debug] Git HEAD: 135392e
@@ -1244,7 +1244,7 @@ For bug reports, this means that your report should contain the *complete* outpu
If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `http://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `http://www.youtube.com/`) is *not* an example URL.
**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `https://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `https://www.youtube.com/`) is *not* an example URL.
### Are you using the latest version?

View File

@@ -156,6 +156,7 @@
- **Cinchcast**
- **CJSW**
- **cliphunter**
- **Clippit**
- **ClipRs**
- **Clipsyndicate**
- **CloserToTruth**
@@ -362,6 +363,7 @@
- **IPrima**
- **iqiyi**: 爱奇艺
- **Ir90Tv**
- **ITTF**
- **ITV**
- **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations
@@ -418,6 +420,7 @@
- **limelight:channel_list**
- **LiTV**
- **LiveLeak**
- **LiveLeakEmbed**
- **livestream**
- **livestream:original**
- **LnkGo**

View File

@@ -10,6 +10,7 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, expect_dict, expect_value
from youtube_dl.compat import compat_etree_fromstring
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
@@ -488,6 +489,91 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None)
def test_parse_mpd_formats(self):
_TEST_CASES = [
(
# https://github.com/rg3/youtube-dl/issues/13919
'float_duration',
'http://unknown/manifest.mpd',
[{
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '318597',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.42001f',
'tbr': 318.597,
'width': 340,
'height': 192,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '638590',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.42001f',
'tbr': 638.59,
'width': 512,
'height': 288,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '1022565',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.4d001f',
'tbr': 1022.565,
'width': 688,
'height': 384,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '2046506',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.4d001f',
'tbr': 2046.506,
'width': 1024,
'height': 576,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '3998017',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.640029',
'tbr': 3998.017,
'width': 1280,
'height': 720,
}, {
'manifest_url': 'http://unknown/manifest.mpd',
'ext': 'mp4',
'format_id': '5997485',
'format_note': 'DASH video',
'protocol': 'http_dash_segments',
'acodec': 'none',
'vcodec': 'avc1.640032',
'tbr': 5997.485,
'width': 1920,
'height': 1080,
}]
),
]
for mpd_file, mpd_url, expected_formats in _TEST_CASES:
with io.open('./test/testdata/mpd/%s.mpd' % mpd_file,
mode='r', encoding='utf-8') as f:
formats = self.ie._parse_mpd_formats(
compat_etree_fromstring(f.read().encode('utf-8')),
mpd_url=mpd_url)
self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None)
if __name__ == '__main__':
unittest.main()

View File

@@ -279,6 +279,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unescapeHTML('&#47;'), '/')
self.assertEqual(unescapeHTML('&eacute;'), 'é')
self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
self.assertEqual(unescapeHTML('&a&quot;'), '&a"')
# HTML5 entities
self.assertEqual(unescapeHTML('&period;&apos;'), '.\'')

18
test/testdata/mpd/float_duration.mpd vendored Normal file
View File

@@ -0,0 +1,18 @@
<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:mpeg:dash:schema:mpd:2011" type="static" minBufferTime="PT2S" profiles="urn:mpeg:dash:profile:isoff-on-demand:2011" mediaPresentationDuration="PT6014S">
<Period bitstreamSwitching="true">
<AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" startWithSAP="1" segmentAlignment="true">
<SegmentTemplate timescale="1000000" presentationTimeOffset="0" initialization="ai_$RepresentationID$.mp4d" media="a_$RepresentationID$_$Number$.mp4d" duration="2000000.0" startNumber="0"></SegmentTemplate>
<Representation id="318597" bandwidth="61587"></Representation>
</AdaptationSet>
<AdaptationSet mimeType="video/mp4" startWithSAP="1" segmentAlignment="true">
<SegmentTemplate timescale="1000000" presentationTimeOffset="0" initialization="vi_$RepresentationID$.mp4d" media="v_$RepresentationID$_$Number$.mp4d" duration="2000000.0" startNumber="0"></SegmentTemplate>
<Representation id="318597" codecs="avc1.42001f" width="340" height="192" bandwidth="318597"></Representation>
<Representation id="638590" codecs="avc1.42001f" width="512" height="288" bandwidth="638590"></Representation>
<Representation id="1022565" codecs="avc1.4d001f" width="688" height="384" bandwidth="1022565"></Representation>
<Representation id="2046506" codecs="avc1.4d001f" width="1024" height="576" bandwidth="2046506"></Representation>
<Representation id="3998017" codecs="avc1.640029" width="1280" height="720" bandwidth="3998017"></Representation>
<Representation id="5997485" codecs="avc1.640032" width="1920" height="1080" bandwidth="5997485"></Representation>
</AdaptationSet>
</Period>
</MPD>

View File

@@ -1483,12 +1483,14 @@ class YoutubeDL(object):
def is_wellformed(f):
url = f.get('url')
valid_url = url and isinstance(url, compat_str)
if not valid_url:
if not url:
self.report_warning(
'"url" field is missing or empty - skipping format, '
'there is an error in extractor')
return valid_url
return False
if isinstance(url, bytes):
sanitize_string_field(f, 'url')
return True
# Filter out malformed formats for better extraction robustness
formats = list(filter(is_wellformed, formats))

View File

@@ -304,11 +304,11 @@ class FileDownloader(object):
"""Report attempt to resume at given byte."""
self.to_screen('[download] Resuming download at byte %s' % resume_len)
def report_retry(self, count, retries):
def report_retry(self, err, count, retries):
"""Report retry in case of HTTP error 5xx"""
self.to_screen(
'[download] Got server HTTP error. Retrying (attempt %d of %s)...'
% (count, self.format_retries(retries)))
'[download] Got server HTTP error: %s. Retrying (attempt %d of %s)...'
% (error_to_compat_str(err), count, self.format_retries(retries)))
def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded."""

View File

@@ -22,8 +22,16 @@ from ..utils import (
class HttpFD(FileDownloader):
def real_download(self, filename, info_dict):
url = info_dict['url']
tmpfilename = self.temp_name(filename)
stream = None
class DownloadContext(dict):
__getattr__ = dict.get
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
ctx = DownloadContext()
ctx.filename = filename
ctx.tmpfilename = self.temp_name(filename)
ctx.stream = None
# Do not include the Accept-Encoding header
headers = {'Youtubedl-no-compression': 'True'}
@@ -38,46 +46,51 @@ class HttpFD(FileDownloader):
if is_test:
request.add_header('Range', 'bytes=0-%s' % str(self._TEST_FILE_SIZE - 1))
# Establish possible resume length
if os.path.isfile(encodeFilename(tmpfilename)):
resume_len = os.path.getsize(encodeFilename(tmpfilename))
else:
resume_len = 0
ctx.open_mode = 'wb'
ctx.resume_len = 0
open_mode = 'wb'
if resume_len != 0:
if self.params.get('continuedl', True):
self.report_resuming_byte(resume_len)
request.add_header('Range', 'bytes=%d-' % resume_len)
open_mode = 'ab'
else:
resume_len = 0
if self.params.get('continuedl', True):
# Establish possible resume length
if os.path.isfile(encodeFilename(ctx.tmpfilename)):
ctx.resume_len = os.path.getsize(encodeFilename(ctx.tmpfilename))
count = 0
retries = self.params.get('retries', 0)
while count <= retries:
class SucceedDownload(Exception):
pass
class RetryDownload(Exception):
def __init__(self, source_error):
self.source_error = source_error
def establish_connection():
if ctx.resume_len != 0:
self.report_resuming_byte(ctx.resume_len)
request.add_header('Range', 'bytes=%d-' % ctx.resume_len)
ctx.open_mode = 'ab'
# Establish connection
try:
data = self.ydl.urlopen(request)
ctx.data = self.ydl.urlopen(request)
# When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see
# https://github.com/rg3/youtube-dl/issues/6057#issuecomment-126129799)
if resume_len > 0:
content_range = data.headers.get('Content-Range')
if ctx.resume_len > 0:
content_range = ctx.data.headers.get('Content-Range')
if content_range:
content_range_m = re.search(r'bytes (\d+)-', content_range)
# Content-Range is present and matches requested Range, resume is possible
if content_range_m and resume_len == int(content_range_m.group(1)):
break
if content_range_m and ctx.resume_len == int(content_range_m.group(1)):
return
# Content-Range is either not present or invalid. Assuming remote webserver is
# trying to send the whole file, resume is not possible, so wiping the local file
# and performing entire redownload
self.report_unable_to_resume()
resume_len = 0
open_mode = 'wb'
break
ctx.resume_len = 0
ctx.open_mode = 'wb'
return
except (compat_urllib_error.HTTPError, ) as err:
if (err.code < 500 or err.code >= 600) and err.code != 416:
# Unexpected HTTP error
@@ -86,15 +99,15 @@ class HttpFD(FileDownloader):
# Unable to resume (requested range not satisfiable)
try:
# Open the connection again without the range header
data = self.ydl.urlopen(basic_request)
content_length = data.info()['Content-Length']
ctx.data = self.ydl.urlopen(basic_request)
content_length = ctx.data.info()['Content-Length']
except (compat_urllib_error.HTTPError, ) as err:
if err.code < 500 or err.code >= 600:
raise
else:
# Examine the reported length
if (content_length is not None and
(resume_len - 100 < int(content_length) < resume_len + 100)):
(ctx.resume_len - 100 < int(content_length) < ctx.resume_len + 100)):
# The file had already been fully downloaded.
# Explanation to the above condition: in issue #175 it was revealed that
# YouTube sometimes adds or removes a few bytes from the end of the file,
@@ -102,152 +115,184 @@ class HttpFD(FileDownloader):
# I decided to implement a suggested change and consider the file
# completely downloaded if the file size differs less than 100 bytes from
# the one in the hard drive.
self.report_file_already_downloaded(filename)
self.try_rename(tmpfilename, filename)
self.report_file_already_downloaded(ctx.filename)
self.try_rename(ctx.tmpfilename, ctx.filename)
self._hook_progress({
'filename': filename,
'filename': ctx.filename,
'status': 'finished',
'downloaded_bytes': resume_len,
'total_bytes': resume_len,
'downloaded_bytes': ctx.resume_len,
'total_bytes': ctx.resume_len,
})
return True
raise SucceedDownload()
else:
# The length does not match, we start the download over
self.report_unable_to_resume()
resume_len = 0
open_mode = 'wb'
break
except socket.error as e:
if e.errno != errno.ECONNRESET:
ctx.resume_len = 0
ctx.open_mode = 'wb'
return
raise RetryDownload(err)
except socket.error as err:
if err.errno != errno.ECONNRESET:
# Connection reset is no problem, just retry
raise
raise RetryDownload(err)
# Retry
count += 1
if count <= retries:
self.report_retry(count, retries)
def download():
data_len = ctx.data.info().get('Content-length', None)
if count > retries:
self.report_error('giving up after %s retries' % retries)
return False
# Range HTTP header may be ignored/unsupported by a webserver
# (e.g. extractor/scivee.py, extractor/bambuser.py).
# However, for a test we still would like to download just a piece of a file.
# To achieve this we limit data_len to _TEST_FILE_SIZE and manually control
# block size when downloading a file.
if is_test and (data_len is None or int(data_len) > self._TEST_FILE_SIZE):
data_len = self._TEST_FILE_SIZE
data_len = data.info().get('Content-length', None)
# Range HTTP header may be ignored/unsupported by a webserver
# (e.g. extractor/scivee.py, extractor/bambuser.py).
# However, for a test we still would like to download just a piece of a file.
# To achieve this we limit data_len to _TEST_FILE_SIZE and manually control
# block size when downloading a file.
if is_test and (data_len is None or int(data_len) > self._TEST_FILE_SIZE):
data_len = self._TEST_FILE_SIZE
if data_len is not None:
data_len = int(data_len) + resume_len
min_data_len = self.params.get('min_filesize')
max_data_len = self.params.get('max_filesize')
if min_data_len is not None and data_len < min_data_len:
self.to_screen('\r[download] File is smaller than min-filesize (%s bytes < %s bytes). Aborting.' % (data_len, min_data_len))
return False
if max_data_len is not None and data_len > max_data_len:
self.to_screen('\r[download] File is larger than max-filesize (%s bytes > %s bytes). Aborting.' % (data_len, max_data_len))
return False
byte_counter = 0 + resume_len
block_size = self.params.get('buffersize', 1024)
start = time.time()
# measure time over whole while-loop, so slow_down() and best_block_size() work together properly
now = None # needed for slow_down() in the first loop run
before = start # start measuring
while True:
# Download and write
data_block = data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
byte_counter += len(data_block)
# exit loop when download is finished
if len(data_block) == 0:
break
# Open destination file just in time
if stream is None:
try:
(stream, tmpfilename) = sanitize_open(tmpfilename, open_mode)
assert stream is not None
filename = self.undo_temp_name(tmpfilename)
self.report_destination(filename)
except (OSError, IOError) as err:
self.report_error('unable to open for writing: %s' % str(err))
if data_len is not None:
data_len = int(data_len) + ctx.resume_len
min_data_len = self.params.get('min_filesize')
max_data_len = self.params.get('max_filesize')
if min_data_len is not None and data_len < min_data_len:
self.to_screen('\r[download] File is smaller than min-filesize (%s bytes < %s bytes). Aborting.' % (data_len, min_data_len))
return False
if max_data_len is not None and data_len > max_data_len:
self.to_screen('\r[download] File is larger than max-filesize (%s bytes > %s bytes). Aborting.' % (data_len, max_data_len))
return False
if self.params.get('xattr_set_filesize', False) and data_len is not None:
byte_counter = 0 + ctx.resume_len
block_size = self.params.get('buffersize', 1024)
start = time.time()
# measure time over whole while-loop, so slow_down() and best_block_size() work together properly
now = None # needed for slow_down() in the first loop run
before = start # start measuring
def retry(e):
if ctx.tmpfilename != '-':
ctx.stream.close()
ctx.stream = None
ctx.resume_len = os.path.getsize(encodeFilename(ctx.tmpfilename))
raise RetryDownload(e)
while True:
try:
# Download and write
data_block = ctx.data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
# socket.timeout is a subclass of socket.error but may not have
# errno set
except socket.timeout as e:
retry(e)
except socket.error as e:
if e.errno not in (errno.ECONNRESET, errno.ETIMEDOUT):
raise
retry(e)
byte_counter += len(data_block)
# exit loop when download is finished
if len(data_block) == 0:
break
# Open destination file just in time
if ctx.stream is None:
try:
write_xattr(tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
except (XAttrUnavailableError, XAttrMetadataError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err))
ctx.stream, ctx.tmpfilename = sanitize_open(
ctx.tmpfilename, ctx.open_mode)
assert ctx.stream is not None
ctx.filename = self.undo_temp_name(ctx.tmpfilename)
self.report_destination(ctx.filename)
except (OSError, IOError) as err:
self.report_error('unable to open for writing: %s' % str(err))
return False
try:
stream.write(data_block)
except (IOError, OSError) as err:
if self.params.get('xattr_set_filesize', False) and data_len is not None:
try:
write_xattr(ctx.tmpfilename, 'user.ytdl.filesize', str(data_len).encode('utf-8'))
except (XAttrUnavailableError, XAttrMetadataError) as err:
self.report_error('unable to set filesize xattr: %s' % str(err))
try:
ctx.stream.write(data_block)
except (IOError, OSError) as err:
self.to_stderr('\n')
self.report_error('unable to write data: %s' % str(err))
return False
# Apply rate limit
self.slow_down(start, now, byte_counter - ctx.resume_len)
# end measuring of one loop run
now = time.time()
after = now
# Adjust block size
if not self.params.get('noresizebuffer', False):
block_size = self.best_block_size(after - before, len(data_block))
before = after
# Progress message
speed = self.calc_speed(start, now, byte_counter - ctx.resume_len)
if data_len is None:
eta = None
else:
eta = self.calc_eta(start, time.time(), data_len - ctx.resume_len, byte_counter - ctx.resume_len)
self._hook_progress({
'status': 'downloading',
'downloaded_bytes': byte_counter,
'total_bytes': data_len,
'tmpfilename': ctx.tmpfilename,
'filename': ctx.filename,
'eta': eta,
'speed': speed,
'elapsed': now - start,
})
if is_test and byte_counter == data_len:
break
if ctx.stream is None:
self.to_stderr('\n')
self.report_error('unable to write data: %s' % str(err))
self.report_error('Did not get any data blocks')
return False
if ctx.tmpfilename != '-':
ctx.stream.close()
# Apply rate limit
self.slow_down(start, now, byte_counter - resume_len)
if data_len is not None and byte_counter != data_len:
err = ContentTooShortError(byte_counter, int(data_len))
if count <= retries:
retry(err)
raise err
# end measuring of one loop run
now = time.time()
after = now
self.try_rename(ctx.tmpfilename, ctx.filename)
# Adjust block size
if not self.params.get('noresizebuffer', False):
block_size = self.best_block_size(after - before, len(data_block))
before = after
# Progress message
speed = self.calc_speed(start, now, byte_counter - resume_len)
if data_len is None:
eta = None
else:
eta = self.calc_eta(start, time.time(), data_len - resume_len, byte_counter - resume_len)
# Update file modification time
if self.params.get('updatetime', True):
info_dict['filetime'] = self.try_utime(ctx.filename, ctx.data.info().get('last-modified', None))
self._hook_progress({
'status': 'downloading',
'downloaded_bytes': byte_counter,
'total_bytes': data_len,
'tmpfilename': tmpfilename,
'filename': filename,
'eta': eta,
'speed': speed,
'elapsed': now - start,
'total_bytes': byte_counter,
'filename': ctx.filename,
'status': 'finished',
'elapsed': time.time() - start,
})
if is_test and byte_counter == data_len:
break
return True
if stream is None:
self.to_stderr('\n')
self.report_error('Did not get any data blocks')
return False
if tmpfilename != '-':
stream.close()
while count <= retries:
try:
establish_connection()
download()
return True
except RetryDownload as e:
count += 1
if count <= retries:
self.report_retry(e.source_error, count, retries)
continue
except SucceedDownload:
return True
if data_len is not None and byte_counter != data_len:
raise ContentTooShortError(byte_counter, int(data_len))
self.try_rename(tmpfilename, filename)
# Update file modification time
if self.params.get('updatetime', True):
info_dict['filetime'] = self.try_utime(filename, data.info().get('last-modified', None))
self._hook_progress({
'downloaded_bytes': byte_counter,
'total_bytes': byte_counter,
'filename': filename,
'status': 'finished',
'elapsed': time.time() - start,
})
return True
self.report_error('giving up after %s retries' % retries)
return False

View File

@@ -9,12 +9,13 @@ from ..compat import (
compat_urllib_parse_urlparse,
)
from ..utils import (
ExtractorError,
find_xpath_attr,
unified_strdate,
get_element_by_attribute,
int_or_none,
NO_DEFAULT,
qualities,
unified_strdate,
)
# There are different sources of video in arte.tv, the extraction process
@@ -79,6 +80,13 @@ class ArteTVBaseIE(InfoExtractor):
info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer']
vsr = player_info['VSR']
if not vsr and not player_info.get('VRU'):
raise ExtractorError(
'Video %s is not available' % player_info.get('VID') or video_id,
expected=True)
upload_date_str = player_info.get('shootingDate')
if not upload_date_str:
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
@@ -107,7 +115,7 @@ class ArteTVBaseIE(InfoExtractor):
langcode = LANGS.get(lang, lang)
formats = []
for format_id, format_dict in player_info['VSR'].items():
for format_id, format_dict in vsr.items():
f = dict(format_dict)
versionCode = f.get('versionCode')
l = re.escape(langcode)

View File

@@ -242,7 +242,12 @@ class BandcampAlbumIE(InfoExtractor):
raise ExtractorError('The page doesn\'t contain any tracks')
# Only tracks with duration info have songs
entries = [
self.url_result(compat_urlparse.urljoin(url, t_path), ie=BandcampIE.ie_key())
self.url_result(
compat_urlparse.urljoin(url, t_path),
ie=BandcampIE.ie_key(),
video_title=self._search_regex(
r'<span\b[^>]+\bitemprop=["\']name["\'][^>]*>([^<]+)',
elem_content, 'track title', fatal=False))
for elem_content, t_path in track_elements
if self._html_search_meta('duration', elem_content, default=None)]

View File

@@ -37,7 +37,8 @@ class BBCCoUkIE(InfoExtractor):
programmes/(?!articles/)|
iplayer(?:/[^/]+)?/(?:episode/|playlist/)|
music/(?:clips|audiovideo/popular)[/#]|
radio/player/
radio/player/|
events/[^/]+/play/[^/]+/
)
(?P<id>%s)(?!/(?:episodes|broadcasts|clips))
''' % _ID_REGEX

View File

@@ -200,6 +200,7 @@ class CBCWatchBaseIE(InfoExtractor):
'media': 'http://search.yahoo.com/mrss/',
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
}
_GEO_COUNTRIES = ['CA']
def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path
@@ -287,6 +288,11 @@ class CBCWatchBaseIE(InfoExtractor):
class CBCWatchVideoIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch:video'
_VALID_URL = r'https?://api-cbc\.cloud\.clearleap\.com/cloffice/client/web/play/?\?.*?\bcontentId=(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TEST = {
# geo-restricted to Canada, bypassable
'url': 'https://api-cbc.cloud.clearleap.com/cloffice/client/web/play/?contentId=3c84472a-1eea-4dee-9267-2655d5055dcf&categoryId=ebc258f5-ee40-4cca-b66b-ba6bd55b7235',
'only_matching': True,
}
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -323,9 +329,10 @@ class CBCWatchIE(CBCWatchBaseIE):
IE_NAME = 'cbc.ca:watch'
_VALID_URL = r'https?://watch\.cbc\.ca/(?:[^/]+/)+(?P<id>[0-9a-f-]+)'
_TESTS = [{
# geo-restricted to Canada, bypassable
'url': 'http://watch.cbc.ca/doc-zone/season-6/customer-disservice/38e815a-009e3ab12e4',
'info_dict': {
'id': '38e815a-009e3ab12e4',
'id': '9673749a-5e77-484c-8b62-a1092a6b5168',
'ext': 'mp4',
'title': 'Customer (Dis)Service',
'description': 'md5:8bdd6913a0fe03d4b2a17ebe169c7c87',
@@ -337,8 +344,8 @@ class CBCWatchIE(CBCWatchBaseIE):
'skip_download': True,
'format': 'bestvideo',
},
'skip': 'Geo-restricted to Canada',
}, {
# geo-restricted to Canada, bypassable
'url': 'http://watch.cbc.ca/arthur/all/1ed4b385-cd84-49cf-95f0-80f004680057',
'info_dict': {
'id': '1ed4b385-cd84-49cf-95f0-80f004680057',
@@ -346,7 +353,6 @@ class CBCWatchIE(CBCWatchBaseIE):
'description': 'Arthur, the sweetest 8-year-old aardvark, and his pals solve all kinds of problems with humour, kindness and teamwork.',
},
'playlist_mincount': 30,
'skip': 'Geo-restricted to Canada',
}]
def _real_extract(self, url):

View File

@@ -124,7 +124,7 @@ class CDAIE(InfoExtractor):
}
def extract_format(page, version):
json_str = self._search_regex(
json_str = self._html_search_regex(
r'player_data=(\\?["\'])(?P<player_data>.+?)\1', page,
'%s player_json' % version, fatal=False, group='player_data')
if not json_str:

View File

@@ -0,0 +1,74 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
parse_iso8601,
qualities,
)
import re
class ClippitIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clippituser\.tv/c/(?P<id>[a-z]+)'
_TEST = {
'url': 'https://www.clippituser.tv/c/evmgm',
'md5': '963ae7a59a2ec4572ab8bf2f2d2c5f09',
'info_dict': {
'id': 'evmgm',
'ext': 'mp4',
'title': 'Bye bye Brutus. #BattleBots - Clippit',
'uploader': 'lizllove',
'uploader_url': 'https://www.clippituser.tv/p/lizllove',
'timestamp': 1472183818,
'upload_date': '20160826',
'description': 'BattleBots | ABC',
'thumbnail': r're:^https?://.*\.jpg$',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<title.*>(.+?)</title>', webpage, 'title')
FORMATS = ('sd', 'hd')
quality = qualities(FORMATS)
formats = []
for format_id in FORMATS:
url = self._html_search_regex(r'data-%s-file="(.+?)"' % format_id,
webpage, 'url', fatal=False)
if not url:
continue
match = re.search(r'/(?P<height>\d+)\.mp4', url)
formats.append({
'url': url,
'format_id': format_id,
'quality': quality(format_id),
'height': int(match.group('height')) if match else None,
})
uploader = self._html_search_regex(r'class="username".*>\s+(.+?)\n',
webpage, 'uploader', fatal=False)
uploader_url = ('https://www.clippituser.tv/p/' + uploader
if uploader else None)
timestamp = self._html_search_regex(r'datetime="(.+?)"',
webpage, 'date', fatal=False)
thumbnail = self._html_search_regex(r'data-image="(.+?)"',
webpage, 'thumbnail', fatal=False)
return {
'id': video_id,
'title': title,
'formats': formats,
'uploader': uploader,
'uploader_url': uploader_url,
'timestamp': parse_iso8601(timestamp),
'description': self._og_search_description(webpage),
'thumbnail': thumbnail,
}

View File

@@ -27,6 +27,7 @@ from ..compat import (
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
compat_xml_parse_error,
)
from ..downloader.f4m import remove_encrypted_media
from ..utils import (
@@ -646,15 +647,29 @@ class InfoExtractor(object):
def _download_xml(self, url_or_request, video_id,
note='Downloading XML', errnote='Unable to download XML',
transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
transform_source=None, fatal=True, encoding=None,
data=None, headers={}, query={}):
"""Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query)
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query)
if xml_string is False:
return xml_string
return self._parse_xml(
xml_string, video_id, transform_source=transform_source,
fatal=fatal)
def _parse_xml(self, xml_string, video_id, transform_source=None, fatal=True):
if transform_source:
xml_string = transform_source(xml_string)
return compat_etree_fromstring(xml_string.encode('utf-8'))
try:
return compat_etree_fromstring(xml_string.encode('utf-8'))
except compat_xml_parse_error as ve:
errmsg = '%s: Failed to parse XML ' % video_id
if fatal:
raise ExtractorError(errmsg, cause=ve)
else:
self.report_warning(errmsg + str(ve))
def _download_json(self, url_or_request, video_id,
note='Downloading JSON metadata',
@@ -1786,7 +1801,7 @@ class InfoExtractor(object):
ms_info['timescale'] = int(timescale)
segment_duration = source.get('duration')
if segment_duration:
ms_info['segment_duration'] = int(segment_duration)
ms_info['segment_duration'] = float(segment_duration)
def extract_Initialization(source):
initialization = source.find(_add_ns('Initialization'))
@@ -2123,11 +2138,11 @@ class InfoExtractor(object):
formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id,
preference=preference)
preference=preference, fatal=False)
elif ext == 'mpd':
is_plain_url = False
formats = self._extract_mpd_formats(
full_url, video_id, mpd_id=mpd_id)
full_url, video_id, mpd_id=mpd_id, fatal=False)
else:
is_plain_url = True
formats = [{
@@ -2169,6 +2184,12 @@ class InfoExtractor(object):
f = parse_content_type(source_attributes.get('type'))
is_plain_url, formats = _media_formats(src, media_type, f)
if is_plain_url:
# res attribute is not standard but seen several times
# in the wild
f.update({
'height': int_or_none(source_attributes.get('res')),
'format_id': source_attributes.get('label'),
})
f.update(formats[0])
media_info['formats'].append(f)
else:

View File

@@ -187,6 +187,7 @@ from .chirbit import (
from .cinchcast import CinchcastIE
from .cjsw import CJSWIE
from .cliphunter import CliphunterIE
from .clippit import ClippitIE
from .cliprs import ClipRsIE
from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE
@@ -508,6 +509,7 @@ from .la7 import LA7IE
from .laola1tv import (
Laola1TvEmbedIE,
Laola1TvIE,
ITTFIE,
)
from .lci import LCIIE
from .lcp import (
@@ -535,7 +537,10 @@ from .limelight import (
LimelightChannelListIE,
)
from .litv import LiTVIE
from .liveleak import LiveLeakIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
)
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,

View File

@@ -1519,14 +1519,27 @@ class GenericIE(InfoExtractor):
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': 'ace83b9ed19b21f68e1b50e844fdf95d',
'md5': '7619da8c820e835bef21a1efa2a0fc71',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
}
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Another LiveLeak embed pattern (#13336)
{
'url': 'https://milo.yiannopoulos.net/2017/06/concealed-carry-robbery/',
'info_dict': {
'id': '2eb_1496309988',
'ext': 'mp4',
'title': 'Thief robs place where everyone was armed',
'description': 'md5:694d73ee79e535953cf2488562288eee',
'uploader': 'brazilwtf',
},
'add_ie': [LiveLeakIE.ie_key()],
},
# Duplicated embedded video URLs
{
@@ -2015,7 +2028,7 @@ class GenericIE(InfoExtractor):
if head_response is not False:
# Check for redirect
new_url = head_response.geturl()
new_url = compat_str(head_response.geturl())
if url != new_url:
self.report_following_redirect(new_url)
if force_videoid:
@@ -2116,7 +2129,7 @@ class GenericIE(InfoExtractor):
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc, video_id,
mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
mpd_url=url)
self._sort_formats(info_dict['formats'])
return info_dict
@@ -2757,9 +2770,9 @@ class GenericIE(InfoExtractor):
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_url = LiveLeakIE._extract_url(webpage)
if liveleak_url:
return self.url_result(liveleak_url, 'LiveLeak')
liveleak_urls = LiveLeakIE._extract_urls(webpage)
if liveleak_urls:
return self.playlist_from_matches(liveleak_urls, video_id, video_title)
# Look for 3Q SDN embeds
threeqsdn_url = ThreeQSDNIE._extract_url(webpage)
@@ -2858,12 +2871,6 @@ class GenericIE(InfoExtractor):
merged[k] = v
return merged
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject')
if json_ld.get('url'):
return merge_dicts(json_ld, info_dict)
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:
@@ -2882,6 +2889,12 @@ class GenericIE(InfoExtractor):
jwplayer_data, video_id, require_title=False, base_url=url)
return merge_dicts(info, info_dict)
# Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject')
if json_ld.get('url'):
return merge_dicts(json_ld, info_dict)
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True

View File

@@ -7,6 +7,7 @@ from ..utils import (
ExtractorError,
int_or_none,
lowercase_escape,
update_url_query,
)
@@ -24,7 +25,14 @@ class GoogleDriveIE(InfoExtractor):
}, {
# video id is longer than 28 characters
'url': 'https://drive.google.com/file/d/1ENcQ_jeCuj7y19s66_Ou9dRP4GKGsodiDQ/edit',
'only_matching': True,
'md5': 'c230c67252874fddd8170e3fd1a45886',
'info_dict': {
'id': '1ENcQ_jeCuj7y19s66_Ou9dRP4GKGsodiDQ',
'ext': 'mp4',
'title': 'Andreea Banica feat Smiley - Hooky Song (Official Video).mp4',
'duration': 189,
},
'only_matching': True
}]
_FORMATS_EXT = {
'5': 'flv',
@@ -44,6 +52,13 @@ class GoogleDriveIE(InfoExtractor):
'46': 'webm',
'59': 'mp4',
}
_BASE_URL_CAPTIONS = 'https://drive.google.com/timedtext'
_CAPTIONS_ENTRY_TAG = {
'subtitles': 'track',
'automatic_captions': 'target',
}
_caption_formats_ext = []
_captions_xml = None
@staticmethod
def _extract_url(webpage):
@@ -53,21 +68,99 @@ class GoogleDriveIE(InfoExtractor):
if mobj:
return 'https://drive.google.com/file/d/%s' % mobj.group('id')
def _download_subtitles_xml(self, video_id, subtitles_id, hl):
if self._captions_xml:
return
self._captions_xml = self._download_xml(
self._BASE_URL_CAPTIONS, video_id, query={
'id': video_id,
'vid': subtitles_id,
'hl': hl,
'v': video_id,
'type': 'list',
'tlangs': '1',
'fmts': '1',
'vssids': '1',
}, note='Downloading subtitles XML',
errnote='Unable to download subtitles XML', fatal=False)
if self._captions_xml:
for f in self._captions_xml.findall('format'):
if f.attrib.get('fmt_code') and not f.attrib.get('default'):
self._caption_formats_ext.append(f.attrib['fmt_code'])
def _get_captions_by_type(self, video_id, subtitles_id, caption_type,
origin_lang_code=None):
if not subtitles_id or not caption_type:
return
captions = {}
for caption_entry in self._captions_xml.findall(
self._CAPTIONS_ENTRY_TAG[caption_type]):
caption_lang_code = caption_entry.attrib.get('lang_code')
if not caption_lang_code:
continue
caption_format_data = []
for caption_format in self._caption_formats_ext:
query = {
'vid': subtitles_id,
'v': video_id,
'fmt': caption_format,
'lang': (caption_lang_code if origin_lang_code is None
else origin_lang_code),
'type': 'track',
'name': '',
'kind': '',
}
if origin_lang_code is not None:
query.update({'tlang': caption_lang_code})
caption_format_data.append({
'url': update_url_query(self._BASE_URL_CAPTIONS, query),
'ext': caption_format,
})
captions[caption_lang_code] = caption_format_data
return captions
def _get_subtitles(self, video_id, subtitles_id, hl):
if not subtitles_id or not hl:
return
self._download_subtitles_xml(video_id, subtitles_id, hl)
if not self._captions_xml:
return
return self._get_captions_by_type(video_id, subtitles_id, 'subtitles')
def _get_automatic_captions(self, video_id, subtitles_id, hl):
if not subtitles_id or not hl:
return
self._download_subtitles_xml(video_id, subtitles_id, hl)
if not self._captions_xml:
return
track = self._captions_xml.find('track')
if track is None:
return
origin_lang_code = track.attrib.get('lang_code')
if not origin_lang_code:
return
return self._get_captions_by_type(
video_id, subtitles_id, 'automatic_captions', origin_lang_code)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://docs.google.com/file/d/%s' % video_id, video_id)
reason = self._search_regex(r'"reason"\s*,\s*"([^"]+)', webpage, 'reason', default=None)
reason = self._search_regex(
r'"reason"\s*,\s*"([^"]+)', webpage, 'reason', default=None)
if reason:
raise ExtractorError(reason)
title = self._search_regex(r'"title"\s*,\s*"([^"]+)', webpage, 'title')
duration = int_or_none(self._search_regex(
r'"length_seconds"\s*,\s*"([^"]+)', webpage, 'length seconds', default=None))
r'"length_seconds"\s*,\s*"([^"]+)', webpage, 'length seconds',
default=None))
fmt_stream_map = self._search_regex(
r'"fmt_stream_map"\s*,\s*"([^"]+)', webpage, 'fmt stream map').split(',')
fmt_list = self._search_regex(r'"fmt_list"\s*,\s*"([^"]+)', webpage, 'fmt_list').split(',')
r'"fmt_stream_map"\s*,\s*"([^"]+)', webpage,
'fmt stream map').split(',')
fmt_list = self._search_regex(
r'"fmt_list"\s*,\s*"([^"]+)', webpage, 'fmt_list').split(',')
resolutions = {}
for fmt in fmt_list:
@@ -97,10 +190,24 @@ class GoogleDriveIE(InfoExtractor):
formats.append(f)
self._sort_formats(formats)
hl = self._search_regex(
r'"hl"\s*,\s*"([^"]+)', webpage, 'hl', default=None)
subtitles_id = None
ttsurl = self._search_regex(
r'"ttsurl"\s*,\s*"([^"]+)', webpage, 'ttsurl', default=None)
if ttsurl:
# the video Id for subtitles will be the last value in the ttsurl
# query string
subtitles_id = ttsurl.encode('utf-8').decode(
'unicode_escape').split('=')[-1]
return {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'formats': formats,
'subtitles': self.extract_subtitles(video_id, subtitles_id, hl),
'automatic_captions': self.extract_automatic_captions(
video_id, subtitles_id, hl),
}

View File

@@ -215,3 +215,21 @@ class Laola1TvIE(Laola1TvEmbedIE):
'formats': formats,
'is_live': is_live,
}
class ITTFIE(InfoExtractor):
_VALID_URL = r'https?://tv\.ittf\.com/video/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'https://tv.ittf.com/video/peng-wang-wei-matsudaira-kenta/951802',
'only_matching': True,
}
def _real_extract(self, url):
return self.url_result(
update_url_query('https://www.laola1.tv/titanplayer.php', {
'videoid': self._match_id(url),
'type': 'V',
'lang': 'en',
'portal': 'int',
'customer': 1024,
}), Laola1TvEmbedIE.ie_key())

View File

@@ -72,15 +72,20 @@ class LiveLeakIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.liveleak.com/view?i=677_1439397581',
'info_dict': {
'id': '677_1439397581',
'title': 'Fuel Depot in China Explosion caught on video',
},
'playlist_count': 3,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src="https?://(?:\w+\.)?liveleak\.com/ll_embed\?(?:.*?)i=(?P<id>[\w_]+)(?:.*)',
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src="(https?://(?:\w+\.)?liveleak\.com/ll_embed\?[^"]*[if]=[\w_]+[^"]+)"',
webpage)
if mobj:
return 'http://www.liveleak.com/view?i=%s' % mobj.group('id')
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -111,23 +116,54 @@ class LiveLeakIE(InfoExtractor):
'age_limit': age_limit,
}
info_dict = entries[0]
for idx, info_dict in enumerate(entries):
for a_format in info_dict['formats']:
if not a_format.get('height'):
a_format['height'] = int_or_none(self._search_regex(
r'([0-9]+)p\.mp4', a_format['url'], 'height label',
default=None))
for a_format in info_dict['formats']:
if not a_format.get('height'):
a_format['height'] = int_or_none(self._search_regex(
r'([0-9]+)p\.mp4', a_format['url'], 'height label',
default=None))
self._sort_formats(info_dict['formats'])
self._sort_formats(info_dict['formats'])
# Don't append entry ID for one-video pages to keep backward compatibility
if len(entries) > 1:
info_dict['id'] = '%s_%s' % (video_id, idx + 1)
else:
info_dict['id'] = video_id
info_dict.update({
'id': video_id,
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
'thumbnail': video_thumbnail,
})
info_dict.update({
'title': video_title,
'description': video_description,
'uploader': video_uploader,
'age_limit': age_limit,
'thumbnail': video_thumbnail,
})
return info_dict
return self.playlist_result(entries, video_id, video_title)
class LiveLeakEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?liveleak\.com/ll_embed\?.*?\b(?P<kind>[if])=(?P<id>[\w_]+)'
# See generic.py for actual test cases
_TESTS = [{
'url': 'https://www.liveleak.com/ll_embed?i=874_1459135191',
'only_matching': True,
}, {
'url': 'https://www.liveleak.com/ll_embed?f=ab065df993c1',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
kind, video_id = mobj.group('kind', 'id')
if kind == 'f':
webpage = self._download_webpage(url, video_id)
liveleak_url = self._search_regex(
r'logourl\s*:\s*(?P<q1>[\'"])(?P<url>%s)(?P=q1)' % LiveLeakIE._VALID_URL,
webpage, 'LiveLeak URL', group='url')
elif kind == 'i':
liveleak_url = 'http://www.liveleak.com/view?i=%s' % video_id
return self.url_result(liveleak_url, ie=LiveLeakIE.ie_key())

View File

@@ -9,6 +9,7 @@ from .common import InfoExtractor
from ..compat import (
compat_chr,
compat_ord,
compat_str,
compat_urllib_parse_unquote,
compat_urlparse,
)
@@ -53,15 +54,18 @@ class MixcloudIE(InfoExtractor):
'only_matching': True,
}]
_keys = [
'return { requestAnimationFrame: function(callback) { callback(); }, innerHeight: 500 };',
'pleasedontdownloadourmusictheartistswontgetpaid',
'window.addEventListener = window.addEventListener || function() {};',
'(function() { return new Date().toLocaleDateString(); })()'
]
_current_key = None
# See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
def _decrypt_play_info(self, play_info, video_id):
KEYS = (
'pleasedontdownloadourmusictheartistswontgetpaid',
'window.addEventListener = window.addEventListener || function() {};',
'(function() { return new Date().toLocaleDateString(); })()',
)
play_info = base64.b64decode(play_info.encode('ascii'))
for num, key in enumerate(KEYS, start=1):
for num, key in enumerate(self._keys, start=1):
try:
return self._parse_json(
''.join([
@@ -69,7 +73,7 @@ class MixcloudIE(InfoExtractor):
for idx, ch in enumerate(play_info)]),
video_id)
except ExtractorError:
if num == len(KEYS):
if num == len(self._keys):
raise
def _real_extract(self, url):
@@ -80,6 +84,22 @@ class MixcloudIE(InfoExtractor):
webpage = self._download_webpage(url, track_id)
if not self._current_key:
js_url = self._search_regex(
r'<script[^>]+\bsrc=["\"](https://(?:www\.)?mixcloud\.com/media/js2/www_js_4\.[^>]+\.js)',
webpage, 'js url', default=None)
if js_url:
js = self._download_webpage(js_url, track_id, fatal=False)
if js:
KEY_RE_TEMPLATE = r'player\s*:\s*{.*?\b%s\s*:\s*(["\'])(?P<key>(?:(?!\1).)+)\1'
for key_name in ('value', 'key_value', 'key_value.*?', '.*?value.*?'):
key = self._search_regex(
KEY_RE_TEMPLATE % key_name, js, 'key',
default=None, group='key')
if key and isinstance(key, compat_str):
self._keys.insert(0, key)
self._current_key = key
message = self._html_search_regex(
r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
webpage, 'error message', default=None)

View File

@@ -80,18 +80,24 @@ class PeriscopeIE(PeriscopeBaseIE):
stream = self._call_api(
'getAccessPublic', {'broadcast_id': token}, token)
video_urls = set()
formats = []
for format_id in ('replay', 'rtmp', 'hls', 'https_hls'):
for format_id in ('replay', 'rtmp', 'hls', 'https_hls', 'lhls', 'lhlsweb'):
video_url = stream.get(format_id + '_url')
if not video_url:
if not video_url or video_url in video_urls:
continue
f = {
video_urls.add(video_url)
if format_id != 'rtmp':
formats.extend(self._extract_m3u8_formats(
video_url, token, 'mp4',
entry_protocol='m3u8_native'
if state in ('ended', 'timed_out') else 'm3u8',
m3u8_id=format_id, fatal=False))
continue
formats.append({
'url': video_url,
'ext': 'flv' if format_id == 'rtmp' else 'mp4',
}
if format_id != 'rtmp':
f['protocol'] = 'm3u8_native' if state in ('ended', 'timed_out') else 'm3u8'
formats.append(f)
})
self._sort_formats(formats)
return {

View File

@@ -18,6 +18,7 @@ from ..utils import (
parse_duration,
qualities,
srt_subtitles_timecode,
try_get,
update_url_query,
urlencode_postdata,
)
@@ -26,6 +27,39 @@ from ..utils import (
class PluralsightBaseIE(InfoExtractor):
_API_BASE = 'https://app.pluralsight.com'
def _download_course(self, course_id, url, display_id):
try:
return self._download_course_rpc(course_id, url, display_id)
except ExtractorError:
# Old API fallback
return self._download_json(
'https://app.pluralsight.com/player/user/api/v1/player/payload',
display_id, data=urlencode_postdata({'courseId': course_id}),
headers={'Referer': url})
def _download_course_rpc(self, course_id, url, display_id):
response = self._download_json(
'%s/player/functions/rpc' % self._API_BASE, display_id,
'Downloading course JSON',
data=json.dumps({
'fn': 'bootstrapPlayer',
'payload': {
'courseId': course_id,
},
}).encode('utf-8'),
headers={
'Content-Type': 'application/json;charset=utf-8',
'Referer': url,
})
course = try_get(response, lambda x: x['payload']['course'], dict)
if course:
return course
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, response['error']['message']),
expected=True)
class PluralsightIE(PluralsightBaseIE):
IE_NAME = 'pluralsight'
@@ -162,10 +196,7 @@ class PluralsightIE(PluralsightBaseIE):
display_id = '%s-%s' % (name, clip_id)
course = self._download_json(
'https://app.pluralsight.com/player/user/api/v1/player/payload',
display_id, data=urlencode_postdata({'courseId': course_name}),
headers={'Referer': url})
course = self._download_course(course_name, url, display_id)
collection = course['modules']
@@ -331,18 +362,7 @@ class PluralsightCourseIE(PluralsightBaseIE):
# TODO: PSM cookie
course = self._download_json(
'%s/player/functions/rpc' % self._API_BASE, course_id,
'Downloading course JSON',
data=json.dumps({
'fn': 'bootstrapPlayer',
'payload': {
'courseId': course_id,
}
}).encode('utf-8'),
headers={
'Content-Type': 'application/json;charset=utf-8'
})['payload']['course']
course = self._download_course(course_id, url, course_id)
title = course['title']
course_name = course['name']

View File

@@ -186,7 +186,7 @@ class PornHubIE(InfoExtractor):
title, thumbnail, duration = [None] * 3
video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a href="/users/|a href="/channels/|span class="username)[^>]+>(.+?)<',
r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:user|channel)s/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<',
webpage, 'uploader', fatal=False)
view_count = self._extract_count(

View File

@@ -2,38 +2,37 @@
from __future__ import unicode_literals
import random
import time
import re
import time
from .common import InfoExtractor
from ..utils import (
sanitized_Request,
strip_jsonp,
unescapeHTML,
clean_html,
ExtractorError,
strip_jsonp,
unescapeHTML,
)
class QQMusicIE(InfoExtractor):
IE_NAME = 'qqmusic'
IE_DESC = 'QQ音乐'
_VALID_URL = r'https?://y\.qq\.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/song/(?P<id>[0-9A-Za-z]+)\.html'
_TESTS = [{
'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD',
'md5': '9ce1c1c8445f561506d2e3cfb0255705',
'url': 'https://y.qq.com/n/yqq/song/004295Et37taLD.html',
'md5': '5f1e6cea39e182857da7ffc5ef5e6bb8',
'info_dict': {
'id': '004295Et37taLD',
'ext': 'mp3',
'title': '可惜没如果',
'release_date': '20141227',
'creator': '林俊杰',
'description': 'md5:d327722d0361576fde558f1ac68a7065',
'description': 'md5:d85afb3051952ecc50a1ee8a286d1eac',
'thumbnail': r're:^https?://.*\.jpg$',
}
}, {
'note': 'There is no mp3-320 version of this song.',
'url': 'http://y.qq.com/#type=song&mid=004MsGEo3DdNxV',
'url': 'https://y.qq.com/n/yqq/song/004MsGEo3DdNxV.html',
'md5': 'fa3926f0c585cda0af8fa4f796482e3e',
'info_dict': {
'id': '004MsGEo3DdNxV',
@@ -46,14 +45,14 @@ class QQMusicIE(InfoExtractor):
}
}, {
'note': 'lyrics not in .lrc format',
'url': 'http://y.qq.com/#type=song&mid=001JyApY11tIp6',
'url': 'https://y.qq.com/n/yqq/song/001JyApY11tIp6.html',
'info_dict': {
'id': '001JyApY11tIp6',
'ext': 'mp3',
'title': 'Shadows Over Transylvania',
'release_date': '19970225',
'creator': 'Dark Funeral',
'description': 'md5:ed14d5bd7ecec19609108052c25b2c11',
'description': 'md5:c9b20210587cbcd6836a1c597bab4525',
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
@@ -105,7 +104,7 @@ class QQMusicIE(InfoExtractor):
[r'albummid:\'([0-9a-zA-Z]+)\'', r'"albummid":"([0-9a-zA-Z]+)"'],
detail_info_page, 'album mid', default=None)
if albummid:
thumbnail_url = "http://i.gtimg.cn/music/photo/mid_album_500/%s/%s/%s.jpg" \
thumbnail_url = 'http://i.gtimg.cn/music/photo/mid_album_500/%s/%s/%s.jpg' \
% (albummid[-2:-1], albummid[-1], albummid)
guid = self.m_r_get_ruin()
@@ -156,15 +155,39 @@ class QQPlaylistBaseIE(InfoExtractor):
def qq_static_url(category, mid):
return 'http://y.qq.com/y/static/%s/%s/%s/%s.html' % (category, mid[-2], mid[-1], mid)
@classmethod
def get_entries_from_page(cls, page):
def get_singer_all_songs(self, singmid, num):
return self._download_webpage(
r'https://c.y.qq.com/v8/fcg-bin/fcg_v8_singer_track_cp.fcg', singmid,
query={
'format': 'json',
'inCharset': 'utf8',
'outCharset': 'utf-8',
'platform': 'yqq',
'needNewCode': 0,
'singermid': singmid,
'order': 'listen',
'begin': 0,
'num': num,
'songstatus': 1,
})
def get_entries_from_page(self, singmid):
entries = []
for item in re.findall(r'class="data"[^<>]*>([^<>]+)</', page):
song_mid = unescapeHTML(item).split('|')[-5]
entries.append(cls.url_result(
'http://y.qq.com/#type=song&mid=' + song_mid, 'QQMusic',
song_mid))
default_num = 1
json_text = self.get_singer_all_songs(singmid, default_num)
json_obj_all_songs = self._parse_json(json_text, singmid)
if json_obj_all_songs['code'] == 0:
total = json_obj_all_songs['data']['total']
json_text = self.get_singer_all_songs(singmid, total)
json_obj_all_songs = self._parse_json(json_text, singmid)
for item in json_obj_all_songs['data']['list']:
if item['musicData'].get('songmid') is not None:
songmid = item['musicData']['songmid']
entries.append(self.url_result(
r'https://y.qq.com/n/yqq/song/%s.html' % songmid, 'QQMusic', songmid))
return entries
@@ -172,42 +195,32 @@ class QQPlaylistBaseIE(InfoExtractor):
class QQMusicSingerIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:singer'
IE_DESC = 'QQ音乐 - 歌手'
_VALID_URL = r'https?://y\.qq\.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/singer/(?P<id>[0-9A-Za-z]+)\.html'
_TEST = {
'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2',
'url': 'https://y.qq.com/n/yqq/singer/001BLpXF2DyJe2.html',
'info_dict': {
'id': '001BLpXF2DyJe2',
'title': '林俊杰',
'description': 'md5:870ec08f7d8547c29c93010899103751',
},
'playlist_count': 12,
'playlist_mincount': 12,
}
def _real_extract(self, url):
mid = self._match_id(url)
singer_page = self._download_webpage(
self.qq_static_url('singer', mid), mid, 'Download singer page')
entries = self.get_entries_from_page(singer_page)
entries = self.get_entries_from_page(mid)
singer_page = self._download_webpage(url, mid, 'Download singer page')
singer_name = self._html_search_regex(
r"singername\s*:\s*'([^']+)'", singer_page, 'singer name',
default=None)
singer_id = self._html_search_regex(
r"singerid\s*:\s*'([0-9]+)'", singer_page, 'singer id',
default=None)
r"singername\s*:\s*'(.*?)'", singer_page, 'singer name', default=None)
singer_desc = None
if singer_id:
req = sanitized_Request(
'http://s.plcloud.music.qq.com/fcgi-bin/fcg_get_singer_desc.fcg?utf8=1&outCharset=utf-8&format=xml&singerid=%s' % singer_id)
req.add_header(
'Referer', 'http://s.plcloud.music.qq.com/xhr_proxy_utf8.html')
if mid:
singer_desc_page = self._download_xml(
req, mid, 'Donwload singer description XML')
'http://s.plcloud.music.qq.com/fcgi-bin/fcg_get_singer_desc.fcg', mid,
'Donwload singer description XML',
query={'utf8': 1, 'outCharset': 'utf-8', 'format': 'xml', 'singermid': mid},
headers={'Referer': 'https://y.qq.com/n/yqq/singer/'})
singer_desc = singer_desc_page.find('./data/info/desc').text
@@ -217,10 +230,10 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
class QQMusicAlbumIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:album'
IE_DESC = 'QQ音乐 - 专辑'
_VALID_URL = r'https?://y\.qq\.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/album/(?P<id>[0-9A-Za-z]+)\.html'
_TESTS = [{
'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1',
'url': 'https://y.qq.com/n/yqq/album/000gXCTb2AhRR1.html',
'info_dict': {
'id': '000gXCTb2AhRR1',
'title': '我们都是这样长大的',
@@ -228,7 +241,7 @@ class QQMusicAlbumIE(QQPlaylistBaseIE):
},
'playlist_count': 4,
}, {
'url': 'http://y.qq.com/#type=album&mid=002Y5a3b3AlCu3',
'url': 'https://y.qq.com/n/yqq/album/002Y5a3b3AlCu3.html',
'info_dict': {
'id': '002Y5a3b3AlCu3',
'title': '그리고...',
@@ -246,7 +259,7 @@ class QQMusicAlbumIE(QQPlaylistBaseIE):
entries = [
self.url_result(
'http://y.qq.com/#type=song&mid=' + song['songmid'], 'QQMusic', song['songmid']
'https://y.qq.com/n/yqq/song/' + song['songmid'] + '.html', 'QQMusic', song['songmid']
) for song in album['list']
]
album_name = album.get('name')
@@ -260,31 +273,30 @@ class QQMusicAlbumIE(QQPlaylistBaseIE):
class QQMusicToplistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:toplist'
IE_DESC = 'QQ音乐 - 排行榜'
_VALID_URL = r'https?://y\.qq\.com/#type=toplist&p=(?P<id>(top|global)_[0-9]+)'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/toplist/(?P<id>[0-9]+)\.html'
_TESTS = [{
'url': 'http://y.qq.com/#type=toplist&p=global_123',
'url': 'https://y.qq.com/n/yqq/toplist/123.html',
'info_dict': {
'id': 'global_123',
'id': '123',
'title': '美国iTunes榜',
},
'playlist_count': 10,
}, {
'url': 'http://y.qq.com/#type=toplist&p=top_3',
'info_dict': {
'id': 'top_3',
'title': '巅峰榜·欧美',
'description': 'QQ音乐巅峰榜·欧美根据用户收听行为自动生成集结当下最流行的欧美新歌:更新时间每周四22点|统'
'计周期:一周(上周四至本周三)|统计对象:三个月内发行的欧美歌曲|统计数量100首|统计算法:根据'
'歌曲在一周内的有效播放次数由高到低取前100名同一歌手最多允许5首歌曲同时上榜|有效播放次数:'
'登录用户完整播放一首歌曲记为一次有效播放同一用户收听同一首歌曲每天记录为1次有效播放'
'description': 'md5:89db2335fdbb10678dee2d43fe9aba08',
},
'playlist_count': 100,
}, {
'url': 'http://y.qq.com/#type=toplist&p=global_106',
'url': 'https://y.qq.com/n/yqq/toplist/3.html',
'info_dict': {
'id': 'global_106',
'id': '3',
'title': '巅峰榜·欧美',
'description': 'md5:5a600d42c01696b26b71f8c4d43407da',
},
'playlist_count': 100,
}, {
'url': 'https://y.qq.com/n/yqq/toplist/106.html',
'info_dict': {
'id': '106',
'title': '韩国Mnet榜',
'description': 'md5:cb84b325215e1d21708c615cac82a6e7',
},
'playlist_count': 50,
}]
@@ -292,18 +304,15 @@ class QQMusicToplistIE(QQPlaylistBaseIE):
def _real_extract(self, url):
list_id = self._match_id(url)
list_type, num_id = list_id.split("_")
toplist_json = self._download_json(
'http://i.y.qq.com/v8/fcg-bin/fcg_v8_toplist_cp.fcg?type=%s&topid=%s&format=json'
% (list_type, num_id),
list_id, 'Download toplist page')
'http://i.y.qq.com/v8/fcg-bin/fcg_v8_toplist_cp.fcg', list_id,
note='Download toplist page',
query={'type': 'toplist', 'topid': list_id, 'format': 'json'})
entries = [
self.url_result(
'http://y.qq.com/#type=song&mid=' + song['data']['songmid'], 'QQMusic', song['data']['songmid']
) for song in toplist_json['songlist']
]
entries = [self.url_result(
'https://y.qq.com/n/yqq/song/' + song['data']['songmid'] + '.html', 'QQMusic',
song['data']['songmid'])
for song in toplist_json['songlist']]
topinfo = toplist_json.get('topinfo', {})
list_name = topinfo.get('ListName')
@@ -314,10 +323,10 @@ class QQMusicToplistIE(QQPlaylistBaseIE):
class QQMusicPlaylistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:playlist'
IE_DESC = 'QQ音乐 - 歌单'
_VALID_URL = r'https?://y\.qq\.com/#type=taoge&id=(?P<id>[0-9]+)'
_VALID_URL = r'https?://y\.qq\.com/n/yqq/playlist/(?P<id>[0-9]+)\.html'
_TESTS = [{
'url': 'http://y.qq.com/#type=taoge&id=3462654915',
'url': 'http://y.qq.com/n/yqq/playlist/3462654915.html',
'info_dict': {
'id': '3462654915',
'title': '韩国5月新歌精选下旬',
@@ -326,7 +335,7 @@ class QQMusicPlaylistIE(QQPlaylistBaseIE):
'playlist_count': 40,
'skip': 'playlist gone',
}, {
'url': 'http://y.qq.com/#type=taoge&id=1374105607',
'url': 'https://y.qq.com/n/yqq/playlist/1374105607.html',
'info_dict': {
'id': '1374105607',
'title': '易入人心的华语民谣',
@@ -339,8 +348,9 @@ class QQMusicPlaylistIE(QQPlaylistBaseIE):
list_id = self._match_id(url)
list_json = self._download_json(
'http://i.y.qq.com/qzone-music/fcg-bin/fcg_ucc_getcdinfo_byids_cp.fcg?type=1&json=1&utf8=1&onlysong=0&disstid=%s'
% list_id, list_id, 'Download list page',
'http://i.y.qq.com/qzone-music/fcg-bin/fcg_ucc_getcdinfo_byids_cp.fcg',
list_id, 'Download list page',
query={'type': 1, 'json': 1, 'utf8': 1, 'onlysong': 0, 'disstid': list_id},
transform_source=strip_jsonp)
if not len(list_json.get('cdlist', [])):
if list_json.get('code'):
@@ -350,11 +360,9 @@ class QQMusicPlaylistIE(QQPlaylistBaseIE):
raise ExtractorError('Unable to get playlist info')
cdlist = list_json['cdlist'][0]
entries = [
self.url_result(
'http://y.qq.com/#type=song&mid=' + song['songmid'], 'QQMusic', song['songmid']
) for song in cdlist['songlist']
]
entries = [self.url_result(
'https://y.qq.com/n/yqq/song/' + song['songmid'] + '.html', 'QQMusic', song['songmid'])
for song in cdlist['songlist']]
list_name = cdlist.get('dissname')
list_description = clean_html(unescapeHTML(cdlist.get('desc')))

View File

@@ -345,11 +345,11 @@ class RaiIE(RaiBaseIE):
media_type = media['type']
if 'Audio' in media_type:
relinker_info = {
'formats': {
'formats': [{
'format_id': media.get('formatoAudio'),
'url': media['audioUrl'],
'ext': media.get('formatoAudio'),
}
}]
}
elif 'Video' in media_type:
relinker_info = self._extract_relinker_info(media['mediaUri'], content_id)

View File

@@ -5,7 +5,6 @@ from .common import InfoExtractor
from ..utils import (
int_or_none,
js_to_json,
ExtractorError,
urlencode_postdata,
extract_attributes,
smuggle_url,
@@ -78,8 +77,10 @@ class TouTvIE(InfoExtractor):
def _real_extract(self, url):
path = self._match_id(url)
metadata = self._download_json('http://ici.tou.tv/presentation/%s' % path, path)
# IsDrm does not necessarily mean the video is DRM protected (see
# https://github.com/rg3/youtube-dl/issues/13994).
if metadata.get('IsDrm'):
raise ExtractorError('This video is DRM protected.', expected=True)
self.report_warning('This video is probably DRM protected.', path)
video_id = metadata['IdMedia']
details = metadata['Details']
title = details['OriginalTitle']

View File

@@ -74,7 +74,7 @@ class UdemyIE(InfoExtractor):
return compat_urlparse.urljoin(base_url, url) if not url.startswith('http') else url
checkout_url = unescapeHTML(self._search_regex(
r'href=(["\'])(?P<url>(?:https?://(?:www\.)?udemy\.com)?/payment/checkout/.+?)\1',
r'href=(["\'])(?P<url>(?:https?://(?:www\.)?udemy\.com)?/(?:payment|cart)/checkout/.+?)\1',
webpage, 'checkout url', group='url', default=None))
if checkout_url:
raise ExtractorError(

View File

@@ -1003,6 +1003,27 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'Skipping DASH manifest',
],
},
{
# The following content has been identified by the YouTube community
# as inappropriate or offensive to some audiences.
'url': 'https://www.youtube.com/watch?v=6SJNVb0GnPI',
'info_dict': {
'id': '6SJNVb0GnPI',
'ext': 'mp4',
'title': 'Race Differences in Intelligence',
'description': 'md5:5d161533167390427a1f8ee89a1fc6f1',
'duration': 965,
'upload_date': '20140124',
'uploader': 'New Century Foundation',
'uploader_id': 'UCEJYpZGqgUob0zVVEaLhvVg',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCEJYpZGqgUob0zVVEaLhvVg',
'license': 'Standard YouTube License',
'view_count': int,
},
'params': {
'skip_download': True,
},
},
{
# itag 212
'url': '1t24XAntNCY',
@@ -1437,9 +1458,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if dash_mpd and dash_mpd[0] not in dash_mpds:
dash_mpds.append(dash_mpd[0])
is_live = None
view_count = None
def extract_view_count(v_info):
return int_or_none(try_get(v_info, lambda x: x['view_count'][0]))
# Get video info
embed_webpage = None
is_live = None
if re.search(r'player-age-gate-content">', video_webpage) is not None:
age_gate = True
# We simulate the access to the video from www.youtube.com/v/{video_id}
@@ -1509,6 +1535,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
continue
get_video_info = compat_parse_qs(video_info_webpage)
add_dash_mpd(get_video_info)
if view_count is None:
view_count = extract_view_count(get_video_info)
if not video_info:
video_info = get_video_info
if 'token' in get_video_info:
@@ -1592,10 +1620,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return self.playlist_result(entries, video_id, video_title, video_description)
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
if 'view_count' in video_info:
view_count = int(video_info['view_count'][0])
else:
view_count = None
if view_count is None:
view_count = extract_view_count(get_video_info)
# Check for "rental" videos
if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info:

View File

@@ -596,7 +596,7 @@ def unescapeHTML(s):
assert type(s) == compat_str
return re.sub(
r'&([^;]+;)', lambda m: _htmlentity_transform(m.group(1)), s)
r'&([^&;]+;)', lambda m: _htmlentity_transform(m.group(1)), s)
def get_subprocess_encoding():

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.08.13'
__version__ = '2017.08.27'