I might take over one of these one-year-free hosted Lemmy instances on my server infrastructure, but I read several times now that Lemmy’s image hosting system Pict-rs is using a lot of storage quickly.
The server I could run this on is limited to 32gb ssd storage with no easy way to expand it.
Is there some way to limit the image storage use and automatically prune old images that are not user or community icons or such?
Support / questions about Lemmy.
pict-rs doesn’t keep track of how often it serves different images, so there’s not a good metric for pruning old images. That said, 0.4 will introduce functionality for cleaning up processed images (e.g. resizes/thumbnails), removing their files & metadata. If they are viewed again, they will be re-generated.
0.4 will also include the ability to scale down images on upload, rather than storing the original resolution. This is not yet implemented, but it’s on my roadmap.
All this said, it is already possible to use pict-rs with object storage (s3-compatible), rather than block storage. That’s a good option if your hosting provider offers it
Hello there! I am one of the administrators at Beehaw. If I’m reading and understanding your comment correctly, then this could solve our most pressing problem of running out of server disk space.
Is there a time-frame when you expect to have pict-rs 0.4 available?
Is deduplication supported by re-using images already in storage if newly upload images share the same hash with them?
Yes. It uses sha256 rather than perceptual hashing, but that’s Good Enough™️
Why not SHA-512 or SHA3?
I chose it at the start of the project 🤷
Maybe is it worthy to make a smooth change to this in the future? https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions
Actually S3 compatible interface might be interesting to link Pict-rs to Garage…
I am aware of garage, but haven’t tested it yet with pict-rs. It’s a cool project for sure
That sounds promising. Any idea when 0.4 will be released?
Object-storage on large cloud providers is not an option for me for various reasons (privacy, legal etc.).
I can only say “when it’s ready.” I think most of what I want to include in 0.4 is there, but I don’t have a ton of time to work on it currently. I might see if I can get my last feature changes in this weekend, then it will be a matter of ensuring the 0.3 -> 0.4 upgrade is smooth, and that storage migration is solid
Update on this: I got the feature work done this weekend, so now I’ll be testing it a bunch for upgrades and storage migrations
Storage requirements depend entirely on the amount of images that users upload. In case of slrpnk.net, there are currently 1.6 GB of pictrs data. You can also use s3 storage, or something like sshfs to mount remote storage.
How much of that is cached from federated instances though? I can hardly imagine a low-traffic community like that uploaded 1.6GB of their own images already. If it is mostly cached then that can increase very quickly as new users subscribe to additional communities on other servers.
Now you know why there needs to be a decentralized picture storage hosting that works for the web, in the same way torrents do for even larger data like video.
You have tons of servers hosting the exact same pictures needlessly while sharing none of the hosting costs.
Pictrs over IPFS could be a good start.
That was the original idea of IPFS, no? Just that they pivoted now to trying to sell you filecoins :(
Oh, you said it too.
Yeah I think so, but I have no idea how “trust” works in IPFS.
In torrents, you have to explicitly be seeding that torrent: if you don’t want to seed the file(s), you remove the torrent. With IPFS I think people can just throw whatever in there.
I’m not against including an ipfs layer in pict-rs, but the complexity would go way up. Federating an image between lemmy servers would require sending the ipfs uri between servers via activitypub, and then each receiving server sending that uri to pict-rs. pict-rs would then need to decide, on each server, if the ipfs-stored image matches that servers’ image requirements (max filesize, max dimensions, etc), and if it does, then that pict-rs server would request to pin the image. I don’t know exactly how ipfs pinning works, but ideally it would only be stored locally if it isn’t already stored in at least N other locations. If the remote image doesn’t match the local server’s configuration, it could either be rejected or downloaded & processed (resized etc).
Serving ipfs-stored images that aren’t replicated locally might also be slow, but I won’t know for sure unless I actually try building this out.
The great thing is that deduplication would also be built-in with the IPFS layer, apart of the obvious advantages.
How would federated posts look if the original server went down? Just a 404 not found on the picture and the discussion left intact?
This seems to be the case right now, yes.
There is no caching, images from other instances are loaded directly from the remote server by your browser.
I see, well that is one risk less then. I guess with automatic down-scaling in pict-rs 0.4 it will be mostly solved as there will not be a bunch of 5mb direct uploads.
Edit: well thumbnails at least are definitely cached, larger images too, I just tested it on slrpnk.net Edit: odd, but not all of them. Something is strange… Ah I think I know what is happening… actual user uploads do not get cached, but images from linked websites do, even if the origin is a federated instance. But those website images are usually quite well optimized.
We would be very interested in a better method for limitation on this as well - some kind of age and size limits or automatic pruning would be wonderful.