/j hey some of us only have 10GbE
Jokes aside, I get the classification. I’m pretty solidly in the category of 2c - more tech than some medium business but without the SLA to go with it.
/j hey some of us only have 10GbE
Jokes aside, I get the classification. I’m pretty solidly in the category of 2c - more tech than some medium business but without the SLA to go with it.
Apologies for being late, I wanted to be as correct as I could be.
So, straight to the point: Nextcloud by default uses plain files if you don’t configure the primary storage to be an S3/object store. As far as I can tell, this is not automatic and is an intentional change at system creation by the original admin. There is a third-party migration script, but there does not appear to be a first-party method of converting between the two. That’s very good news for you! (I think/hope)
My instance was set up as a standalone, so I cannot speak for the all-in-one image. Poking around the root data directory (datadirectory
in the config.php
), I was able to locate my user account by internal username - which if you do not use LDAP will be the shortened login name. On default LDAP configs, this internal username may be a GUID, but that can be changed during the LDAP enablement process by overriding the Internal Username field in the Expert LDAP settings.
Once in the user’s home folder in the root data directory, my subdirectory options are cache, files, files_trashbin, files_versions, uploads.
files
contains the “live” structure of how I perceive my Nextcloud home folder in the Web UI and the Nextcloud Desktop sync enginefiles_trashbin
is an unstructured data folder containing every file that was deleted by this user and kept per the trash folder’s retention policy (this can be configured at the site level). Files retain their original name, but have a suffix added which takes the form .d######...
where the numbers appear to be a Unix timestamp, likely the deletion date. A quick scan of these with the file
command in Linux showed that each one had an expected file header based on its extension (i.e., a .png
showed as a PNG image with an expected resolution). In the Web UI, there is metadata about which folder the file originally resided in, but I was not able to quickly identify this in the file structure. I believe this info is coming in from the SQL database.files_version
are how Nextcloud is storing its file version history (if enabled). Old versions are cleaned up per a set of default behaviors to keep more copies of more recent changes, up to a maximum age deletion threshold set at the site level. This folder is stored in approximately the same structure as the main files
live structure, however each copy of each version is appended a suffix .v######...
where the number appears to be the Unix timestamp the version was taken (*I have not verified that this exactly matches what the UI shows, nor have I read the source code that generates this). I’ve spot checked via the Linux file
command and sha256 that the files in this versions structure appear to be real data - tested one Excel doc and one plain text doc.I think that should get a fairly rough answer to your original question, but if I left something out you’re curious about, let me know.
Finally, I wanted to thank you for making me actually take a look at how I had decided to configure and back up my Nextcloud instance and ngl it was kind of a mess. The trash bin and versions can both get out of hand if you have frequently changing or deleting/recreating files (I have network synchronization glued onto some of my games that do not have good remote save support). Retention policy on trash and versions cleaned up extraneous data a lot, as only one of those was partially configured.
I can see a lot of room for improvements… just gotta rip the band-aid off and make intelligent decisions rather than just slapping an rsync
job that connects to the Nextcloud instance and replicates down the files and backend database. Not terrible, but not great.
In the backend I’m already using ZFS for my files and Redis database, but my core SQL database was located on the server’s root partition (which is XFS - I’d rather not mess with a DKMS module from a boot CD if something happens and upstream borks the compile, which is precisely what happened when I upgraded to OpenZFS 2.1.15).
I do not have automatic ZFS snapshots configured at this time, but based on the above, I’m reasonably confident that I could get data back from a ZFS snapshot if any of the normal guardrails within Nextcloud failed or did not work as intended (trash bin and internal version history). Plus, the data in that cursed rsync backup should be at least 90% functional.
I forget where I originally found this and Google on my phone was unhelpful.
My favorite annoying trick is x -=- 1
. It looks like it shouldn’t work because that’s not precisely a valid operator, but it functions as an increment equivalent to x += 1
It works because -=
still functions are “subtract and assign”, but the second minus applies to the 1 making it -1.
Probably best to go with something in the 3.5" line, unless you’re going enterprise 2.5" (which are entirely different birds than consumer drives)
Whatever you get for your NAS, make sure it’s CMR and not SMR. SMR drives do not perform well in NAS arrays.
Many years ago I for some low cost 2.5" Barracuda for my servers only to find out years after I bought them that they were SMR and that may have been a contributing factor to them not being as fast as I expected.
TLDR: Read the datasheet
I don’t have a full answer to snapshots right now, but I can confirm Nextcloud has VFS support on Windows. I’ve been working on a project to move myself over to it from Syno drive. Client wise, the two have fairly similar features with one exception - Nextcloud generates one Explorer sidebar object per connection, which I think Synology handles as shortcuts in the one directory. If prefer if NC did the later or allowed me to choose, but I’m happier with what I got for now.
As for the snapshotting, you should be able to snapshot the underlying FS/DB at the same time, but I haven’t poked deeply at that. Files I believe are plain (I will disassemble my nextcloud server to confirm this tonight and update my comment), but some do preserve version history so I want to be sure before I give you final confirmation. The Nextcloud root data directory is broken up by internal user ID, which is an immutable field (you cannot change your username even in LDAP), probably because of this filesystem.
One thing that may interest you is the external storage feature, which I’ve been working on migrating a large data set I have to:
Admin docs for reference: https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/external_storage_configuration_gui.html
I use LDAP user auth to my nextcloud, with two external shares to my NAS using a pass-through session password (the NAS is AD joined to the same domain as Nextcloud uses for LDAPS). I don’t know if/how the “store password in database” option is encrypted, but if anyone knows I would be curious, because using session passwords prevents the user from sharing the folder to at least a federated destination (I tried with my friend’s NC server, haven’t tried with a local user yet but I assume the same limitations apply). If that’s your vibe, then this is a feature XD.
One of my two external storage mounts is a “common” share with multiple users accessing the same directory, and the second share is \\nas.example.com\home\nextcloud. Internally, these I believe is handled by PHP spawning smbclient
subprocesses, so if you have lots of remote files and don’t want to nuke your Nextcloud, you will probably need to increase the PHP child limits (that too me too long to solve lol)
That funny sub-mount name above handles an edge case where Nextcloud/DAV can’t handle directories with certain characters - notably the # that Synology uses to expose their #recycle and #snapshot structures. This means that remote mount to SMB has a limitation at the moment where you can’t mount the base share of a Synology NAS that has this feature enabled. I tried a server-side Nextcloud plugin to try to filter this out before it exposed to DAV, but it was glitchy. Unsure if this was because I just had too many files for it to handle thanks to the way Synology snapshots are exposed or if it actually was something else - either way I worked around the problem for now by not ever mounting a base share of my Synology NAS. Other snapshot exposure methods may be affected - I have a ZFS TrueNAS Core, so maybe I’ll throw that at it and see if I can break Nextcloud again :P
Edit addon: OP just so I answer your real question when I get to this again this evening - when you said that Nextcloud might not meet your needs, was your concern specifically the server-side data format? I assume from the rest of your questions that you’re concerned with data resilience and the ability to get your data back without any vendor tools - that it will just be there when you need it.
I’m not entirely sure what you’re seeking to accomplish here - are you looking to just impose authorization on a subset of the images? Probably those should be in a non-public bucket for starters.
Looking to only give certain people access to files and also have a nicer UI (a la Google Drive / Photos)? Maybe plain S3 isn’t the play here and a dedicated application is needed for that subset.
Pre signed URLs may also be a thing useful to what you’re trying to to solve. https://docs.min.io/docs/javascript-client-api-reference.html#presignedGetObject
Adding on one aspect to things others have mentioned here.
I personally have both ports/URLs opened and VPN-only services.
IMHO, it also depends on the exposure tolerance the software has or risk of what could get compromised if an attacker were to find the password.
Start by thinking of the VPN itself (Taliscale, Wireguard, OpenVPN, IPSec/IKEv2, Zerotier) as a service just like the service your considering exposing.
Almost all (working on the all part lol) of my external services require TOTP/2FA and are required to be directly exposed - i.e. VPN gateway, jump host, file server (nextcloud), git server, PBX, music reflector I used for D&D, game servers shared with friends. Those ones I either absolutely need to be external (VPN, jump) or are external so that I don’t have to deal with the complicated networking of per-user firewalls so my friends don’t need to VPN to me to get something done.
The second part for me is tolerance to be external and what risk it is if it got popped. I have a LOT of things I just don’t want on the web - my VM control panels (proxmox, vSphere, XCP), my UPS/PDU, my NAS control panel, my monitoring server, my SMB/RDP sessions, etc. That kind of stuff is super high risk - there’s a lot of damage that someone could do with that, a LOT of attack surface area, and, especially in the case of embedded firmware like the UPSs and PDUs, potentially software that the vendor hasn’t updated in years with who-knows-what bugs lurking in it.
So there’s not really a one size fits all kind of situation. You have to address the needs of each service you host on a case by case basis. Some potential questions to ask yourself (but obviously a non-exhaustive list):
So, as you can see, it’s not just cut and dry. You have to think about each service you host and what it does.
Larger well known products - such as Guacamole, Nextcloud, Owncloud, strongswan, OpenVPN, Wireguard - are known to behave well under these circumstances. That’s going to factor in to this too. Many times the right answer will be to expose a port - the most important thing is to make an active decision to do so.
I’m not the commenter but I can take a guess - I would assume “data source” refers to a machine readable database or aggregator.
Making the system capable of turning off a generic external service in an automated way isn’t necessarily trivial, but it’s doable given appropriate systems.
Knowing when to turn a service off is going to be the million dollar question. It not only has to determine what the backend application version is during its periodic health check, it also needs to then make an autonomous decision that a vulnerability exists and is severe enough to take action.
Home Assistant probably provides a “safe list” of versions that instances regularly pull down and automatically disconnect if they determine themselves to be affected, or, of the remote UI connection passes through the Home Assistant Central servers, the Central servers could maintain that safety database and off switch. (Note - I don’t have a home assistant so I can’t check myself)
I think the company the bought VMware Fusion and workstation also owns Parallels
As others said, depends on your use case. There are lots of good discussions here about mirroring vs single disks, different vendors, etc. Some backup systems may want you to have a large filesystem available that would not be otherwise attainable without a RAID 5/6.
Enterprise backups tend to fall along the recommendation called 3-2-1:
On my home system, I have 3-2-0 for most data and 4-3-0 for my most important virtual machines. My home system doesn’t have an off-site, but I do have two external hard drives connected to my NAS.
Story time
I had one of my two backup drives fail a few months ago. Literally actually nothing of value was lost, just went down to the electronics shop and bought a bigger drive from the same vendor (preserving the one on each vendor approach). Reformatted the disk, recreated the backup job, then ran the first transfer. Pretty much not a big deal, all the data was still in 2 other places - the source itself, and the NAS primary array.
The most important thing to determine about a backup when you plan one - think about how much the data is valuable to you. That’s how much you might be willing to spend on keeping that data safe.
Running nextcloud (non docker version) and I don’t see near so many client updates - usually once every few weeks, which would be a reasonable expected pace. Server updates are less frequent.
On Windows (all of my primary devices), I just install the NC client update and skip the explorer restart, pending full reboot later. Tis the nature of literally anything that deeply integrates with Explorer. I’ve seen explorer “death” during updates from several vendors that have similar explorer plugins, not just NC. Explorer sometimes just decides to nope out even without NC updating.
Now on one device I hadn’t opened for a while, I saw NC run two updates in a row, but that was my fault for procrastinating the first one.
Here’s the desktop release history: https://github.com/nextcloud/desktop/releases
I don’t see a “one every day” within the block of time between Dec 6 and today, unless you had the release candidate builds which may have been more frequent in a few spots.
It being a laptop will almost undoubtedly make that endeavour more challenging. Off hand, I can’t think of a single non -proprietary internal connector from a major vendor that doesn’t already have a protocol established.
If there’s spare I/O, it’s most likely either not hooked up, was only used as a debug header, or fused off as a feature not available on that model. If it is indeed connected to something, you’d need to find documentation on that exact model of laptop since boards can sometimes vary even within the same series (such as whether a GPU is available). Chances are, whatever your find will need a specific vendor library that may or may not work on your version of the OS.
Unlike RPi and similar devices, you won’t find many consumer x86 devices that leave GPIO available and documented.
Off-hand, I think almost every LCD display I’ve encountered on x86 is plugged in to either a serial (for character displays) or higher-level protocol (for more complex displays)
Possibly important detail - what type of computer do you propose running this? Most methods that are common if you search the internet or ask here will likely apply to Raspberry Pi and it’s clones, but if you have something more esoteric it might not work.
Man this was a depressing post to read.
In the IT world, we just call that a server. The usual golden rule for backups is 3-2-1:
So, if the data is only server side, it’s just data. If the data is only client side, it’s just data. But if the data is fully replicated on both sides, now you have a backup.
There’s a related adage regarding backups: “if there’s two copies of the data, you effectively have one. If there’s only one copy of the data, you can never guarantee it’s there”. Basically, it means you should always assume one copy somewhere will fail and you will be left with n-1 copies. In your example, if your server failed or got ransomwared, you wouldn’t have a complete dataset since the local computer doesn’t have a full replica.
I recently had a a backup drive fail on me, and all I had to do was just buy a new one. No data loss, I just regenerated the backup as soon as the drive was spun up. I’ve also had to restore entire servers that have failed. Minimal data loss since the last backup, but nothing I couldn’t rebuild.
Edit: I’m not saying what your asking for is wrong or bad, I’m just saying “backup” isn’t the right word to ask about. It’ll muddy some of the answers as to what you’re really looking for.
What platform?
Another user said it - what your asking for isn’t a backup, it’s just data transfer.
It sounds like you’re looking for a storage backend that hosts all your data and can download data to the client side on the fly.
If your use case is Windows, Nextcloud Desktop may be what you looking for. I have a similar setup with the game clips folder. It detects changes and auto uploads then, while deleting less recently used data that’s properly server side. This feature might be in Mac but I haven’t tested it.
Backup wise, I capture an rsync of the nextcloud database and filesystem server-side and store it on a different chassis. That then gets backed up again to a USB drive I can grab and run.
Nextcloud also supports external storage, which the server directly connects to: https://docs.nextcloud.com/server/latest/admin_manual/configuration_files/external_storage_configuration_gui.html
Oh I am in fact giving the giant auto complete function little credit. But just like any computer system, an AI can reflect the biases of it’s creators and dataset. Similarly, the computer can only give an answer to the question it has been asked.
Dataset wise, we don’t know exactly what the bot was trained on, other than “a lot”. I would like to hope it’s creators acted in good judgement, but as creators/maintainers of the AI, there may be an inherent (even if unintentional) bias towards the creation and adoption of AI. Just like how some speech recognition models have issues with some dialects or image recognition has issues with some skin tones - both based on the datasets they ingested.
The question itself invites at least some bias and only asks for benefits. I work in IT, and I see this situation all the time with the questions some people have in tickets: the question will be “how do I do x”, and while x is a perfectly reasonable thing for someone to want to do, it’s not really the final answer. As reasoning humans, we can also take the context of a question to provide additional details without blindly reciting information from the first few lmgtfy results.
(Stop reading here if you don’t want a ramble)
AI is growing yes and it’s getting better, but it’s still a very immature field. Many of its beneficial cases have serious drawbacks that mean it should NOT be “given full control of a starship”, so to speak.
While OP’s question is about the benefits, I think it’s also important to talk about the drawbacks at the same time. All that information could be inadvertently filtered out. Would you blindly trust the health of you child or significant other to a chatbot that may or may not be hallucinating? Would you want your boss to fire you because the computer determined your recorded task time to resolution was low? What about all those dozens of people you helped in side chats that don’t have tickets?
There’s a great saying about not letting progress get in the way of perfection, meaning that we shouldn’t get too caught on getting the last 10-20% of completion. But with decision making that can affect peoples’ lives and livelihoods, we need to be damn sure the computer is going to make the right decision every time or not trust it to have full controls at all.
As the future currently stands, we still need humans constantly auditing the decisions of our computers (both standard procedural and AI) for safely’s sake. All of those examples above could have been solved by a trained human gating the result. In the powershell case, my coworker was that person. If we’re trusting the computers with at much decision making as that Bing answer proposes, the AI models need to be MUCH better trained at how to do their jobs than they currently are. Am I saying we should stop using and researching AI? No, but not enough people currently understand that these tools have incredibly rough edges and the ability for a human to verify answers is absolutely critical.
Lastly, are humans biased? Yes absolutely. You can probably see my own bias in the construction of this answer.
Seems a bit biased to ask an AI for the benefits of AI…
Not saying anything specific is wrong, just that appearances matter
I missed the Photoshop lol…
I’ve been through enough airports with that doggo profile and a similar message I hadn’t considered the possibility it wasn’t some new way TSA was printing their “don’t pet the service dogs” poster.