GitHub - Y2Z/monolith: ⬛️ CLI tool for saving complete web pages as a single HTML file
Extracto
⬛️ CLI tool for saving complete web pages as a single HTML file - Y2Z/monolith
Contenido
_____ ______________ __________ ___________________ ___
| \ / \ | | | | | |
| \_/ __ \_| __ | | ___ ___ |__| |
| | | | | | | | | | | |
| |\ /| |__| _ |__| |____| | | | | __ |
| | \___/ | | \ | | | | | | |
|___| |__________| \_____________________| |___| |___| |___|
A data hoarder’s dream come true: bundle any web page into a single HTML file. You can finally replace that gazillion of open tabs with a gazillion of .html files stored somewhere on your precious little drive.
Unlike the conventional “Save page as”, monolith not only saves the target document, it embeds CSS, image, and JavaScript assets all at once, producing a single HTML5 document that is a joy to store and share.
If compared to saving websites with wget -mpk, this tool embeds all assets as data URLs and therefore lets browsers render the saved page exactly the way it was on the Internet, even when no network connection is available.
Installation
Using Cargo (cross-platform)
Via Homebrew (macOS and GNU/Linux)
Via Chocolatey (Windows)
Via Scoop (Windows)
scoop install main/monolithVia MacPorts (macOS)
sudo port install monolithUsing Snapcraft (GNU/Linux)
Using Guix (GNU/Linux)
Using AUR (Arch Linux)
Using aports (Alpine Linux)
Using FreeBSD packages (FreeBSD)
Using FreeBSD ports (FreeBSD)
cd /usr/ports/www/monolith/ make install clean
Using pkgsrc (NetBSD, OpenBSD, Haiku, etc)
cd /usr/pkgsrc/www/monolith make install clean
Using containers
docker build -t y2z/monolith . sudo install -b dist/run-in-container.sh /usr/local/bin/monolith
From source
Dependencies: libssl cargo
Install cargo (GNU/Linux)
Check if cargo is installedIf cargo is not already installed, install and add it to your existing $PATH (paraphrasing the official installation instructions):
curl https://sh.rustup.rs -sSf | sh . "$HOME/.cargo/env"
Proceed with installing from source:
git clone https://github.com/Y2Z/monolith.git cd monolith make install
Using pre-built binaries (Windows, ARM-based devices, etc)
Every release contains pre-built binaries for Windows, GNU/Linux, as well as platforms with non-standard CPU architecture.
Usage
monolith https://lyrics.github.io/db/P/Portishead/Dummy/Roads/ -o portishead-roads-lyrics.htmlcat index.html | monolith -aIiFfcMv -b https://original.site/ - > result.htmlOptions
-a: Exclude audio sources-b: Use custombase URL-B: Forbid retrieving assets from specified domain(s)-c: Exclude CSS-C: Read cookies fromfile-d: Allow retrieving assets only from specifieddomain(s)-e: Ignore network errors-E: Save document using customencoding-f: Omit frames-F: Exclude web fonts-h: Print help information-i: Remove images-I: Isolate the document-j: Exclude JavaScript-k: Accept invalid X.509 (TLS) certificates-M: Don't add timestamp and URL information-n: Extract contents of NOSCRIPT elements-o: Write output tofile(use “-” for STDOUT)-s: Be quiet-t: Adjustnetwork request timeout-u: Provide customUser-Agent-v: Exclude videos
Whitelisting and blacklisting domains
Options -d and -B provide control over what domains can be used to retrieve assets from, e.g.:
monolith -I -d example.com -d www.example.com https://example.com -o example-only.htmlmonolith -I -B -d .googleusercontent.com -d googleanalytics.com -d .google.com https://example.com -o example-no-ads.htmlDynamic content
Monolith doesn't feature a JavaScript engine, hence websites that retrieve and display data after initial load may require usage of additional tools.
For example, Chromium (Chrome) can be used to act as a pre-processor for such pages:
chromium --headless --incognito --dump-dom https://github.com | monolith - -I -b https://github.com -o github.htmlProxies
Please set https_proxy, http_proxy, and no_proxy environment variables.
Contributing
Please open an issue if something is wrong, that helps make this project better.
Related projects
- Monolith Chrome Extension: https://github.com/rhysd/monolith-of-web
- Pagesaver: https://github.com/distributed-mind/pagesaver
- Personal WayBack Machine: https://github.com/popey/pwbm
- Hako: https://github.com/dmpop/hako
- Monk: https://github.com/monk-dev/monk
License
To the extent possible under law, the author(s) have dedicated all copyright related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty.
Keep in mind that monolith is not aware of your browser’s session
Fuente: GitHub