As part of AWS? S3 stands for “simple storage solution” and it is used for storing data in the cloud. A typical s3 setup has a “bucket” which would act like a folder directory on your computer. At that point it can be pretty much however you want to set it up. In theory it can store anything, as long as it can be converted into a binary string, I believe. I havent worked in AWS in a few years, but I recall it being easy enough to use for storing files when handling file transfers with other microservices like Lambdas. You just need to configure a few things, like the bucket name, the “file name” (I say it that way, because you dont necessarily have to store files - and anything stored in s3 has to be converted to that binary string), and the
It can be even more than just simple storage when used with other microservices, the possibilities can be endless
I was never told exactly what it was for. My suspicion was that it seemed more like acquiring training data for an audio processing machine learning library. This was about 10 years ago, so after the likes of siri, but way before something like chat gpt
I used to do mechanical turk jobs for some quick and easy pocket money. There were several types of tasks you could do, and there was a sort of ranking system to dissuade anyone from just inputting junk instead of answering seriously. I usually stuck to surveys and things I would describe as fancy captchas. I recall a few jobs where the task was to record yourself in different environments reading the same script of text. I can’t see that type of job for being anything other than training data for AI/ML
Not the OP, but I have been getting a bug where I select a show/movie to watch in the roku app, and instead of playing it will exit out to the list of shows/movies menu. These same shows work just fine in the app on a google android TV, or on a computer
Robots.txt is a file that is is accessible as part of an http request. It’s a backend configuration file that sets rules for what automatically running web crawlers are allowed. It can set both who is and who isn’t allowed. Google is usually the most widely allowed domain for bots just because their crawler is how they find websites for search results. But it’s basically the honor system. You could write a scraper today that goes to websites that it is being told it doesn’t have permission to view this page, ignore it, and still get the information
I have been exposed as a fraud and I will take my punishment! j/k, but worth noting