{"id":23534,"date":"2024-11-12T17:12:33","date_gmt":"2024-11-12T13:12:33","guid":{"rendered":"https:\/\/me-en.kaspersky.com\/blog\/?p=23534"},"modified":"2024-11-12T17:12:33","modified_gmt":"2024-11-12T13:12:33","slug":"how-to-use-web-archives-and-permanently-save-webpages","status":"publish","type":"post","link":"https:\/\/me-en.kaspersky.com\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/23534\/","title":{"rendered":"How to save web pages for posterity"},"content":{"rendered":"<p>Contrary to the popular belief that anything online stays online, the internet doesn\u2019t remember everything. In a <a href=\"https:\/\/www.kaspersky.com\/blog\/how-to-backup-online-services-and-web-pages\/52214\/\" target=\"_blank\" rel=\"noopener nofollow\">previous post in this series<\/a>, we examined no fewer than nine scenarios in which you could lose access to online content. We also provided a detailed guide to what information you absolutely must (and preferably quickly) back up to your computer and how to do it. Today, we\u2019ll discuss how to easily save web pages to your computer, how to organize these archives, and what to do if your favorite site has gone AWOL.<\/p>\n<p>Let\u2019s say you want to save a blog post with a recipe, compile a bibliography for your research paper, or even preserve a specific online publication for legal purposes. All of the above are published as web pages \u2014 which have a tendency to disappear at the wrong moment. Want to reminisce about music news and gossip from 2005? Good luck with that \u2014 the <a href=\"https:\/\/www.hollywoodreporter.com\/tv\/tv-news\/mtv-news-shut-down-paramount-layoffs-kurt-loder-1235483098\/\" target=\"_blank\" rel=\"nofollow noopener\">MTV News site shut down and all its articles and interviews are no longer available<\/a>. Check references in Wikipedia articles? <a href=\"https:\/\/www.pewresearch.org\/data-labs\/2024\/05\/17\/when-online-content-disappears\/\" target=\"_blank\" rel=\"nofollow noopener\">11% of them lead nowhere<\/a>, even though they were working when the article was published. This phenomenon of \u201clink rot\u201d \u2014 the gradual deletion or relocation of online content \u2014 is rapidly becoming a major problem. <a href=\"https:\/\/www.pewresearch.org\/data-labs\/2024\/05\/17\/when-online-content-disappears\/\" target=\"_blank\" rel=\"nofollow noopener\">38% of pages<\/a> that existed ten years ago are no longer accessible today. So, if there\u2019s a web page out there that you like or need, the wise move would be to create a backup.<\/p>\n<h2>How to save a web page to your computer<\/h2>\n<p>Since a web page consists of dozens or even hundreds of files, backing it up will require a bit of effort. Here are the main ways to do it:<\/p>\n<p><strong>Save only the text as an HTML file.<\/strong> Select the \u201cSave page as\u2026\u201d menu command or button in your browser and then select \u201cWebpage, HTML Only\u201d. This will only save the text of the web page, without any graphics or other eye candy.<\/p>\n<p><strong>Save text and images. <\/strong>The \u201cWebpage, Complete\u201d option will create, besides an HTML file, a folder with the same name containing all graphic elements, styles, and scripts from the page. A downside of this option is that saving a lot of auxiliary files clutters your drive. The \u201cWebpage, Single File\u201d option is more convenient, bundling the web page and all its resources into a single <em>.mhtml<\/em> file. This will open freely in Chrome or Edge, but other browsers may have issues. This option is not available in all browsers, but if you install the <a href=\"https:\/\/github.com\/gildas-lormeau\/SingleFile\" target=\"_blank\" rel=\"nofollow noopener\">SingleFile<\/a> extension (available for most browsers), you can save the entire web page and its media content as a single HTML file that opens perfectly fine in all modern browsers.<\/p>\n<p><strong>Print to PDF.<\/strong> To preserve the main content of the page, but scrap menus and banners, your best option is <strong>Print to PDF<\/strong>. The resulting file will open on any computer.<\/p>\n<p><strong>With any of these options, make sure that the main text that you actually want to keep is still readable when you open the document.<\/strong><\/p>\n<h2>An easier way to save a web page<\/h2>\n<p>The methods described above are a bit time-consuming and create clutter on your hard drive. For greater convenience, use a dedicated service such as <a href=\"https:\/\/getpocket.com\/\" target=\"_blank\" rel=\"nofollow noopener\">Pocket<\/a> (formerly Read It Later), <a href=\"https:\/\/wallabag.org\" target=\"_blank\" rel=\"nofollow noopener\">wallabag<\/a>, or <a href=\"https:\/\/raindrop.io\/pro\/buy\" target=\"_blank\" rel=\"nofollow noopener\">Raindrop.io<\/a>. They all work the same way: you send a link from which the service retrieves a document with all the illustrations, cleans the page of anything unnecessary, and saves it in your personal online storage. Even if the original page gets deleted or modified, the version you want will remain in your archive. These services allow you to group and sort your links, search for text inside, and view your saved pages on any device. For desktop, there\u2019s an extension available for all the major browsers; and for mobile, there\u2019s an app.<\/p>\n<p>All these services offer an \u201ceternal\u201d archive only with a premium subscription, meaning you\u2019ll have to pay for the convenience. That said, Wallabag is open-source \u2014 you can install it on your own server and not pay for third-party services or worry about the service getting shut down.<\/p>\n<p>Some note-taking apps can also save complete web pages. These include <a href=\"https:\/\/evernote.com\/\" target=\"_blank\" rel=\"nofollow noopener\">Evernote<\/a>, where the feature is called \u201cWeb Clipper\u201d.<\/p>\n<h2>How to save a web page for others<\/h2>\n<p>If it\u2019s not just a copy for yourself that you need, but to share a certain version of the page with others, you\u2019ll need a public-archiving service.<\/p>\n<p>The best-known is the Internet Archive (<a href=\"https:\/\/archive.org\" target=\"_blank\" rel=\"nofollow noopener\">archive.org<\/a>) and its Wayback Machine. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Web_archiving\" target=\"_blank\" rel=\"noopener nofollow\">Other options<\/a> include <a href=\"https:\/\/archive.today\" target=\"_blank\" rel=\"nofollow noopener\">archive.today<\/a> (aka <a href=\"https:\/\/archive.is\/\" target=\"_blank\" rel=\"nofollow noopener\">archive.is<\/a>), <a href=\"https:\/\/perma.cc\" target=\"_blank\" rel=\"nofollow noopener\">perma.cc<\/a>, and <a href=\"https:\/\/megalodon.jp\" target=\"_blank\" rel=\"nofollow noopener\">megalodon.jp<\/a>. They all work on a similar principle: either at the user\u2019s request or automatically they visit web pages and save a copy on their servers.<\/p>\n<p>To request archiving of a web page, go to <a href=\"https:\/\/web.archive.org\/\" target=\"_blank\" rel=\"nofollow noopener\">web.archive.org<\/a> and enter the full address in the <strong><em>Save Page Now<\/em><\/strong> box. After you click <strong><em>Save<\/em><\/strong>, a window appears describing all of the page\u2019s loaded components, followed by a permanent link to the site in its preserved state. It looks like this: <a href=\"https:\/\/web.archive.org\/web\/20240924045754\/https:\/www.kaspersky.com\/blog\" target=\"_blank\" rel=\"nofollow noopener\">https:\/\/web.archive.org\/web\/20240924045754\/https:\/\/www.kaspersky.com\/blog<\/a>. The link shows both the address of the saved page and the exact time of saving \u2014 perfect for archival purposes.<\/p>\n<p>Registering on <a href=\"https:\/\/archive.org\" target=\"_blank\" rel=\"nofollow noopener\">archive.org<\/a> lets you manage a collection of such links, take screenshots of saved sites, and download copies of them in the special web-archiving format.<\/p>\n<div id=\"attachment_52593\" style=\"width: 2352px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/11\/12165509\/how-to-use-web-archives-and-permanently-save-webpages-01.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-52593\" class=\"size-full wp-image-52593\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/11\/12165509\/how-to-use-web-archives-and-permanently-save-webpages-01.jpg\" alt=\"On archive.org, you can view previously saved versions of websites and save the current state of any site \u2014 for example, our blog\" width=\"2342\" height=\"1396\"><\/a><p id=\"caption-attachment-52593\" class=\"wp-caption-text\">On archive.org, you can view previously saved versions of websites and save the current state of any site \u2014 for example, our blog<\/p><\/div>\n<p>On opening the archive link, you\u2019ll see the saved page with a timestamp indicating when the snapshot was taken. This feature is useful for tracking and demonstrating changes in website data: price fluctuations, product description updates, edited news reports, and deleted information. The latter is particularly important for historical and cultural researchers based on defunct websites. Below, you can check out one of the first versions of GeoCities, a once popular web-hosting service that let you create \u201chome pages\u201d, express yourself, and find friends with shared interests long before social networks. It\u2019s only thanks to the Wayback Machine that we can see it now \u2014 the site closed shop in 2016.<\/p>\n<div id=\"attachment_52592\" style=\"width: 2356px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/11\/12165535\/how-to-use-web-archives-and-permanently-save-webpages-02.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-52592\" class=\"size-full wp-image-52592\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/11\/12165535\/how-to-use-web-archives-and-permanently-save-webpages-02.jpg\" alt=\"A gift for the old-timers: one of the earliest versions of GeoCities.com\" width=\"2346\" height=\"1360\"><\/a><p id=\"caption-attachment-52592\" class=\"wp-caption-text\">A gift for the old-timers: one of the earliest versions of GeoCities.com<\/p><\/div>\n<h2>How to find deleted internet content or an old version of a website<\/h2>\n<p>To view an old version of any website:<\/p>\n<ul>\n<li>Open <a href=\"https:\/\/web.archive.org\/\" target=\"_blank\" rel=\"nofollow noopener\">archive.org<\/a>.<\/li>\n<li>Enter the full address of the website or a specific page in the box next to the logo and click Enter. If the exact URL is unknown, you can enter the name of the website or words that describe it well.<\/li>\n<li>Select the desired website from the list. The results show at a glance how many copies are archived and for what period.<\/li>\n<li>Use the calendar to select which of the saved copies of the site you wish to view. Dates for which there is a saved copy are circled \u2014 the larger the circle, the more copies were made that day.<\/li>\n<li>Click the desired date and inspect the saved site. Note that loading a copy from the archive may take a few minutes.<\/li>\n<li>The calendar graph above the site copy lets you navigate to older and newer copies.<\/li>\n<\/ul>\n<div id=\"attachment_52591\" style=\"width: 2384px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/11\/12165601\/how-to-use-web-archives-and-permanently-save-webpages-03.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-52591\" class=\"size-full wp-image-52591\" src=\"https:\/\/media.kasperskydaily.com\/wp-content\/uploads\/sites\/37\/2024\/11\/12165601\/how-to-use-web-archives-and-permanently-save-webpages-03.jpg\" alt=\"How to explore old versions of sites at web.archive.org\" width=\"2374\" height=\"2047\"><\/a><p id=\"caption-attachment-52591\" class=\"wp-caption-text\">How to explore old versions of sites at web.archive.org<\/p><\/div>\n<p>You can copy the link to the retrieved copy from the address bar to access the archived site directly, bypassing the search interface.<\/p>\n<h2>What if archive.org can\u2019t help<\/h2>\n<p>The foundation behind <a href=\"https:\/\/archive.org\/\" target=\"_blank\" rel=\"nofollow noopener\">archive.org<\/a> sometimes complies with the requests of copyright holders and other authorized parties to exclude certain sites from the Wayback Machine. Also, the service never aimed to preserve the entire internet, so it may happen that the page you need was never indexed. In such cases, try looking for it in other time capsules.<\/p>\n<p><a href=\"https:\/\/archive.today\/\" target=\"_blank\" rel=\"nofollow noopener\">Archive.today<\/a> (aka <a href=\"https:\/\/archive.is\/\" target=\"_blank\" rel=\"nofollow noopener\">archive.is<\/a>) doesn\u2019t automatically save pages \u2014 it does so only at the request of users. Among other things, this does away with having to follow instructions for search robots (robots.txt), and means that the archive contains documents that aren\u2019t available in the Wayback Machine.<\/p>\n<p>Another important web-archiving project is <a href=\"https:\/\/perma.cc\" target=\"_blank\" rel=\"nofollow noopener\">perma.cc<\/a>, created by a consortium of major world libraries. However, it\u2019s only free for participating organizations. Individual users can subscribe to a paid plan, with pricing based on the number of archived links.<\/p>\n<p>A powerful alternative to specialized archives is search engines\u2019 cached content. To index any web page, search engines retrieve its text, so a crude but readable version of almost any page can be found there. For a long time, Google\u2019s cache was the most accessible, but in early 2024, the search giant <a href=\"https:\/\/gizmodo.com\/google-has-officially-kiled-cache-links-1851220408\" target=\"_blank\" rel=\"nofollow noopener\">removed the direct link to its cache<\/a> from search results. The service still works, but accessing it directly is very difficult.<\/p>\n<p>Therefore, it\u2019s better to use <strong>browser extensions<\/strong> that make internet archives easier to work with. For example, if a link takes you to a deleted page or a defunct website, the <a href=\"https:\/\/github.com\/dessant\/web-archives\" target=\"_blank\" rel=\"nofollow noopener\">Web Archives<\/a> extension redirects you straight to an archived copy of this page at <a href=\"https:\/\/web.archive.org\/\" target=\"_blank\" rel=\"nofollow noopener\">web.archive.org<\/a>, <a href=\"https:\/\/archive.today\/\" target=\"_blank\" rel=\"nofollow noopener\">archive.today<\/a>, or <a href=\"https:\/\/perma.cc\" target=\"_blank\" rel=\"nofollow noopener\">perma.cc<\/a>, or shows a cached version of it from Google, Bing, or Yandex.<\/p>\n<h2>How to save data from other online services<\/h2>\n<p>Besides web pages, there are many other online services \u2014 from photo albums and notes to social networks \u2014 that hold data <a href=\"https:\/\/www.kaspersky.com\/blog\/how-to-backup-online-services-and-web-pages\/52214\/\" target=\"_blank\" rel=\"noopener nofollow\">you also may want to save<\/a>. Of course, recommendations vary for different types of data and specific services, but for your convenience, we\u2019ve grouped all related instructions under the <a href=\"https:\/\/www.kaspersky.com\/blog\/tag\/backup\/\" target=\"_blank\" rel=\"noopener nofollow\">backup<\/a> tag. You can read about creating backups for:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.kaspersky.com\/blog\/notion-backup-and-migration-guide\/52076\/\" target=\"_blank\" rel=\"noopener nofollow\">Notion<\/a><\/li>\n<li><a href=\"https:\/\/www.kaspersky.com\/blog\/telegram-privacy-security-backup-aug2024\/52051\/\" target=\"_blank\" rel=\"noopener nofollow\">Telegram<\/a><\/li>\n<li><a href=\"https:\/\/www.kaspersky.com\/blog\/whatsapp-backup-google-drive\/23627\/\" target=\"_blank\" rel=\"noopener nofollow\">Whats\u0410pp<\/a><\/li>\n<li>2FA <a href=\"https:\/\/www.kaspersky.com\/blog\/how-to-backup-authenticator-app\/42103\/\" target=\"_blank\" rel=\"noopener nofollow\">authenticator apps<\/a><\/li>\n<li><a href=\"https:\/\/www.kaspersky.com\/blog\/how-to-backup-online-services-and-web-pages\/52214\/\" target=\"_blank\" rel=\"noopener nofollow\">Other services<\/a><\/li>\n<\/ul>\n<p>And don\u2019t forget to <a href=\"https:\/\/me-en.kaspersky.com\/premium?icid=me-en_bb2022-kdplacehd_acq_ona_smm__onl_b2c_kdaily_lnk_sm-team___kprem___\" target=\"_blank\" rel=\"noopener\">safeguard your backups<\/a>\u00a0against ransomware and spyware!<\/p>\n<input type=\"hidden\" class=\"category_for_banner\" value=\"premium-geek\">\n","protected":false},"excerpt":{"rendered":"<p>Web pages often disappear, move, or change content. How to keep them the way you want, or easily locate a web archive?<\/p>\n","protected":false},"author":2722,"featured_media":23537,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[9],"tags":[557,43,939,2792],"class_list":{"0":"post-23534","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tips","8":"tag-backup","9":"tag-privacy","10":"tag-web","11":"tag-web-pages"},"hreflang":[{"hreflang":"en-ae","url":"https:\/\/me-en.kaspersky.com\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/23534\/"},{"hreflang":"en-in","url":"https:\/\/www.kaspersky.co.in\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/28281\/"},{"hreflang":"ar","url":"https:\/\/me.kaspersky.com\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/12144\/"},{"hreflang":"en-gb","url":"https:\/\/www.kaspersky.co.uk\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/28419\/"},{"hreflang":"es-mx","url":"https:\/\/latam.kaspersky.com\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/27801\/"},{"hreflang":"es","url":"https:\/\/www.kaspersky.es\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/30537\/"},{"hreflang":"it","url":"https:\/\/www.kaspersky.it\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/29286\/"},{"hreflang":"ru","url":"https:\/\/www.kaspersky.ru\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/38517\/"},{"hreflang":"tr","url":"https:\/\/www.kaspersky.com.tr\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/12929\/"},{"hreflang":"x-default","url":"https:\/\/www.kaspersky.com\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/52587\/"},{"hreflang":"fr","url":"https:\/\/www.kaspersky.fr\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/22367\/"},{"hreflang":"pt-br","url":"https:\/\/www.kaspersky.com.br\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/23119\/"},{"hreflang":"ru-kz","url":"https:\/\/blog.kaspersky.kz\/how-to-use-web-archives-and-permanently-save-webpages\/28496\/"},{"hreflang":"en-au","url":"https:\/\/www.kaspersky.com.au\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/34374\/"},{"hreflang":"en-za","url":"https:\/\/www.kaspersky.co.za\/blog\/how-to-use-web-archives-and-permanently-save-webpages\/33999\/"}],"acf":[],"banners":"","maintag":{"url":"https:\/\/me-en.kaspersky.com\/blog\/tag\/backup\/","name":"backup"},"_links":{"self":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/23534","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/users\/2722"}],"replies":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/comments?post=23534"}],"version-history":[{"count":2,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/23534\/revisions"}],"predecessor-version":[{"id":23538,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/posts\/23534\/revisions\/23538"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/media\/23537"}],"wp:attachment":[{"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/media?parent=23534"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/categories?post=23534"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/me-en.kaspersky.com\/blog\/wp-json\/wp\/v2\/tags?post=23534"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}