Probably a good idea to go see how much storage will be necessary...

AnimalsDream@slrpnk.net · edit-2 4 months ago

Probably a good idea to go see how much storage will be necessary...

kayzeekayzee@lemmy.blahaj.zone · 4 months ago

For wikipedia you’ll want to use Kiwix. A full backup of wikipedia is only like 100GB, and I think that includes pictures too.

clif@lemmy.world · edit-2 4 months ago

Last time I updated it was closer to 120GB but if you’re not sweating 100 GB then an extra 20 isn’t going to bother anyone these days.

Also, thanks for reminding me that I need to check my dates and update.

EDIT: you can also easily configure a SBC like a Raspberry Pi (or any of the clones) that will boot, set the Wi-Fi to access point mode, and serve kiwix as a website that anyone (on the local AP wifi network) can connect to and query… And it’ll run off a USB battery pack. I have one kicking around the house somewhere

Fmstrat@lemmy.world · edit-2 4 months ago

Do you recommend adding anything else to it?

For instance, OSM maps?

I’ve been thinking about running the Kiwix app + OSMAnd on an old Android phone and auto updating it once a year.

mistermodal@lemmy.ml · 4 months ago

Yeah also if you make a Zim wiki or convert a website into Zim then you can run that stuff too. If you use Emacs it’s easy to convert some pages to wikitext for Zim too

Fmstrat@lemmy.world · edit-2 4 months ago

120GB not including Wikimedia 😉

Also, I wish they included OSM maps, not just the wiki.

Bobo The Great@startrek.website · 4 months ago

You can easily download planet.osm, I think it’s a couple of TB for the compressed file.

pyrflie@lemmy.dbzer0.com · edit-2 4 months ago

deleted by creator

lambalicious@lemmy.sdf.org · 4 months ago

“backups”? Pray tell, fine sir and or madam, what is that?

wurstgulasch3000@feddit.org · edit-2 4 months ago

You know there’s only two kind of people, those who do backups and those that haven’t lost a hard drive/data before. Also: raid is no backup

GreenKnight23@lemmy.world · 4 months ago

I have been archiving Linux builds for the last 20 years so I could effectively install Linux on almost any hardware since 1998-ish.

I have been archiving docker images to my locally hosted gitlab server for the past 3-5 years (not sure when I started tbh). I’ve got around 100gb of images ranging from core images like OS to full app images like Plex, ffmpeg, etc.

I also have been archiving foss projects into my gitlab and have been using pipelines to ensure they remain up-to-date.

the only thing I lack are packages from package managers like pip, bundler, npm, yum/dnf, apt. there’s just so much to cache it’s nigh impossible to get everything archived.

I have even set up my own local CDN for JS imports on HTML. I use rewrite rules in nginx to redirect them to my local sources.

my goal is to be as self-sustaining on local hosting as possible.

utopiah@lemmy.world · 4 months ago

FWIW :

fabien@debian2080ti:/media/fabien/slowdisk$ ls -lhS offline_prep/
total 341G
-rw-r--r-- 1 fabien fabien 103G Jul  6  2024 wikipedia_en_all_maxi_2024-01.zim
-rw-r--r-- 1 fabien fabien  81G Apr 22  2023 gutenberg_mul_all_2023-04.zim
-rw-r--r-- 1 fabien fabien  75G Jul  7  2024 stackoverflow.com_en_all_2023-11.zim
-rw-r--r-- 1 fabien fabien  74G Mar 10  2024 planet-240304.osm.pbf
-rw-r--r-- 1 fabien fabien 3.8G Oct 18 06:55 debian-13.1.0-amd64-DVD-1.iso
-rw-r--r-- 1 fabien fabien 2.6G May  7  2023 ifixit_en_all_2023-04.zim
-rw-r--r-- 1 fabien fabien 1.6G May  7  2023 developer.mozilla.org_en_all_2023-02.zim
-rw-r--r-- 1 fabien fabien 931M May  7  2023 diy.stackexchange.com_en_all_2023-03.zim
-rw-r--r-- 1 fabien fabien 808M Jun  5  2023 wikivoyage_en_all_maxi_2023-05.zim
-rw-r--r-- 1 fabien fabien 296M Apr 30  2023 raspberrypi.stackexchange.com_en_all_2022-11.zim
-rw-r--r-- 1 fabien fabien 131M May  7  2023 rapsberry_pi_docs_2023-01.zim
-rw-r--r-- 1 fabien fabien 100M May  7  2023 100r-off-the-grid_en_2022-06.zim
-rw-r--r-- 1 fabien fabien  61M May  7  2023 quantumcomputing.stackexchange.com_en_all_2022-11.zim
-rw-r--r-- 1 fabien fabien  45M May  7  2023 computergraphics.stackexchange.com_en_all_2022-11.zim
-rw-r--r-- 1 fabien fabien  37M May  7  2023 wordnet_en_all_2023-04.zim
-rw-r--r-- 1 fabien fabien  23M Jul 17  2023 kiwix-tools_linux-armv6-3.5.0-1.tar.gz
-rw-r--r-- 1 fabien fabien  16M Oct  6 21:32 be-stib-gtfs.zip
-rw-r--r-- 1 fabien fabien 3.8M Oct  6 21:32 be-sncb-gtfs.zip
-rw-r--r-- 1 fabien fabien 2.3M May  7  2023 termux_en_all_maxi_2022-12.zim
-rw-r--r-- 1 fabien fabien 1.9M May  7  2023 kiwix-firefox_3.8.0.xpi

but if you want the easier version just get Kiwix on whatever device in front of you right now (yes, even mobile phone assuming you have the space) then get whatever content you need.

If need a bit of help I recorded TechSovereignty at home, episode 11 - Offline Wikipedia, Kiwix and checksums with a friend just 3 weeks ago.

I also wrote randomly update https://fabien.benetou.fr/Content/Vademecum and coded https://git.benetou.fr/utopiah/offline-octopus but tbh KDE-Connect is much better now.

The point though is having such a repository takes minutes. If you don’t have the space, buy a 512Go microSD for 50EUR then put that on, stuff it in a drawer then move on. If you want to every 3 months or whenever you feel like it, updated it.

TL;DR: takes longer to write such a meme than actually do it.

SystemNeo@toast.ooo · 4 months ago

The English Language Wikipedia probably wouldn’t be hard, or Debian Stable.

All of Debian’s packages might be a tad more expensive, though.

notabot@piefed.social · 4 months ago

It depends if you want the images or previous versions of wikipedia too. The current version is about 25Gb compressed, the dump with all versions is aparently multiple terabytes. They don’t say how much media they have, but I’m guessing it’s roughly “lots”.

dephyre@lemmy.world · 4 months ago

This might be a good place to start for Wikipedia;

https://meta.wikimedia.org/wiki/Data_dump_torrents#English_Wikipedia

SoftestSapphic@lemmy.world · 4 months ago

And the english with no pictures is even smaller

And you can use Kiwix to setup a locally hosted wikipedia using the data dumps

Gerowen@lemmy.world · edit-2 4 months ago

Neither are that bad honestly. I have jigdo scripts I run with every point release of Debian and have a copy of English Wikipedia on a Kiwix mirror I also host. Wikipedia is a tad over 100 GB. The source, arm64 and amd64 complete repos (DVD images) for Debian Trixie, including the network installer and a couple live boot images, are 353 GB.

Kiwix has copies of a LOT of stuff, including Wikipedia on their website. You can view their zim files with a desktop application or host your own web version. Their website is: https://kiwix.org/

If you want (or if Wikipedia is censored for you) you can also look at my mirror to see what a web hosted version looks like: https://kiwix.marcusadams.me/

Note: I use Anubis to help block scrapers. You should have no issues as a human other than you may see a little anime girl for a second on first load, but every once and a while Brave has a disagreement with her and a page won’t load correctly. I’ve only seen it in Brave, and only rarely, but I’ve seen it once or twice so thought I’d mention it.

Retro_unlimited@lemmy.world · 4 months ago

I also recommend downloading “Flashpoint archive” to have flash games and animations to stay entertained.

There is a 4gb version and a 2.3TB version.

Valmond@lemmy.world · 4 months ago

Is that Flash exclusive or do they accept other games from that era?

Retro_unlimited@lemmy.world · 4 months ago

I’m not sure, but I do think it’s just flash

kadu@scribe.disroot.org · 4 months ago

There is a 4gb version and a 2.3TB version.

That’s quite the range

Retro_unlimited@lemmy.world · 4 months ago

When I downloaded it years ago it was 1.8TB. It’s crazy how big the archive is. The smaller one is just so it’s accessible to most people.

Pumpkin Escobar@lemmy.world · 4 months ago

I stumbled across this sort of fascinating area of doomsday prepping a few weeks back.

https://prepperpress.com/usb/

A nice addition to that, don’t just make it a USB, but a raspberry pi. So you’d have a reasonably low-powered computer you could easily take with you.

Not suggesting this one as it seems a bit expensive to me, but https://www.prepperdisk.com/products/prepper-disk-premium-over-512gb-of-survival-content?view=sl-8978CA41

Maroon@lemmy.world · 4 months ago

I thought the whole point of torrenting was to decentralise distribution. I use torrents to get my distros.

In my own little bubble, I thought that’s how most people got their distro.

bluemoon@piefed.social · 3 months ago

magnetlink to any linux package repo torrent if you don’t mind?

i don’t wanna scrape since it takes forever and burdens

prole@lemmy.blahaj.zone · edit-2 4 months ago

What happens when they just cut the underwater cables? Torrent over carrier pigeon for a linux distro would take ages

Possibly linux@lemmy.zip · edit-2 4 months ago

We need some more community wifi projects

Community Wisps are cool

hayvan@feddit.nl · 4 months ago

Sneakernet to the rescue. Some of you are too young to know about walking around with boxes full of disks.

DarkAri@lemmy.blahaj.zone · 4 months ago

A good way to see what the future of places like the U.S are is to look at places like North Korea, where they do exactly this, move files around on flash media to avoid the state censors.

magnolia_mayhem@lemmy.world · 4 months ago

Tiny jump drives on pigeons is low key excellent imo

Sestren@lemmy.world · 4 months ago

Pigeon latency is horrible, but the bandwidth is pretty great. You could probably load up an adult pigeon with at least 12TB of media.

Whitebrow@lemmy.world · 4 months ago

https://en.wikipedia.org/wiki/IP_over_Avian_Carriers

Just gonna leave this here for whoever wants to read more on the methodology and potential risks.

TachyonTele@piefed.social · 4 months ago

Over a 30-mile (48 km) distance, a single pigeon may be able to carry tens of gigabytes of data in around an hour, which on an average bandwidth basis compared very favorably to early ADSL standards, even when accounting for lost drives.

Compared to what I use at home now, this sounds great

Dr. Quadragon ❌@mastodon.ml · edit-2 4 months ago

@Maroon I thought torrent technology to be a godsend for package managers.

Why none of them use it?

I mean, damn.

@AnimalsDream

stinerman@midwest.social · 4 months ago

Torrents are often used for installers, but for packages it tends to be more trouble than what it’s worth. Is creating a torrent for a 4k library worth it?

Possibly linux@lemmy.zip · edit-2 4 months ago

I would also add Openstreetmap to the list

Valmond@lemmy.world · 4 months ago

Okay so where do I find some cheap hard drives? Europe if possible :-)

∃∀λ@programming.dev · 4 months ago

This post foreshadowed today’s AWS outage.

AnimalsDream@slrpnk.net · 4 months ago

👀

Diplomjodler@lemmy.world · 4 months ago

How would one go about making an offline copy of the repos? Asking for a friend.

notabot@piefed.social · 4 months ago

Wikipedia has torrents of the text, but you’d have to download images separately.

Debian, and its base packages have mirroring instructions here. Third party repos would need mirroring separately.

urhovaldeko@lemmy.world · 4 months ago

Start from here https://wiki.debian.org/DebianRepository/Setup

FauxLiving@lemmy.world · 4 months ago

Arch: https://wiki.archlinux.org/title/DeveloperWiki:NewMirrors

The official repo is only about 80GB, I have an old copy from when I was running an airgapped system. Not sure about the AUR, it’s probably in the TBs range though.

Voytrekk@sopuli.xyz · 4 months ago

AUR might not be as big as you think. It would wouldn’t work for offline tho since many AUR packages pull archives from websites during the build process.

FauxLiving@lemmy.world · 4 months ago

Oh yeah, didn’t think about that. Having a bunch of PKGBUILD files isn’t very useful.

I guess you could compile all the things, if you had a lot of spare processing power. A once a year snapshot would probably be enough for anything short of a Mad Max future.

Jul (they/she)@piefed.blahaj.zone · edit-2 4 months ago

That future might not be far off considering what Trump did today. Balance of power is seriously about to shift.

JasonDJ@lemmy.zip · edit-2 4 months ago

man rsync
man cron
xdg-open \ 
  https://lmgtfy.app/?q=debian+rsync+mirror

juipeltje@lemmy.world · 4 months ago

Yeah not gonna lie, i think i heard someone in a youtube video a while back talk about how the entirety of wikipedia takes up like 200 gigs or something like that, and it got me seriously considering to actually make that offline backup. Shit is scary when countries like the uk are basically blocking you from having easy access to knowledge.

mic_check_one_two@lemmy.dbzer0.com · 4 months ago

Yeah, it’s surprisingly small when it’s compressed if you exclude things like images and media. It’s just text, after all. But the high level of compression requires special software to actually read without uncompressing the entire archive. There are dedicated devices you can get, which pretty much only do that. Like there are literal Wikipedia readers, where you just give it an archive file and it’ll allow you to search for and read articles.

Axolotl@feddit.it · 4 months ago

if you remove topics you are not interessed it can shrink even more

KaChilde@sh.itjust.works · 4 months ago

Sure, but removing knowledge kind of goes against what creating a Wikipedia backup is about…

Axolotl@feddit.it · 4 months ago

Well, i doubt i will ever need to know anything about a football player or a car

potoooooooo ✅️@lemmy.world · edit-2 4 months ago

“Fellow survivors, oh my God! What are your names?”

“I’m OJ Simpson. This is my friend Aaron Hernandez. And this is his car, Christine.”

jstin86457@lemmy.world · 4 months ago

Sorry, I’m out of the loop. Is there something particular that triggered this that I missed?

meliaesc@lemmy.world · edit-2 4 months ago

gestures broadly