• 3 Posts
  • 14 Comments
Joined 1 year ago
cake
Cake day: June 29th, 2023

help-circle

  • Good luck! You can try the huggingface-chat repo, or ollama with this web-ui. Both should be decent, as they have instructions to set up a docker container.

    I believe the Llama 3 models are out there in a torrent somewhere, but I didn’t dig to find it. For the 70B model, you’ll probably need around 64GB of RAM available, but the 7B one should run fine with just 8GB. It will be somewhat slow though, compared to the ChatGPT experience. The self-attention mechanism can be parallelized, which is why you will see much better results on a GPU. According to some others that tested it, if you offload some stuff to RAM, you could see ~10-12 tokens per second on an RTX 3090 for certain 70B models. But more capable ones will be at less than 1 token per second, all depending on the context window you use.

    If you don’t have a GPU available, just give the Phi-3 model a try :D If you quantize it to 4 bits, it can apparently get 12 tokens per second on an iPhone haha. It should play nice with pooling information from a search engine, or a vector database like milvus, qdrant or chroma.


  • What db2 already said. Microsoft just released Phi-3 mini, which could, allegedly, run locally on newer smartphones.

    If I understood correctly, the Rabbit thingy just captures your information locally and then forwards it to their server. So, if you want more power, you could probably do the same by submitting the same info to a bigger open source model than Phi-3, like Llama 3, hosted on your homelab. I believe you can set it up with huggingface/gradio, which sort of provides an API that you could use.

    That way, you don’t need a shitty orange box, and can always get the latest open source models with a few lines of code. There are plenty of open source frameworks in the works at the moment, and I believe that we’re not far off from having multi-modal LLMs running on homelab-level hardware (if you don’t mind a bit of lag).





  • The Framework 13 inch model should be plenty, especially if you want to dev on the go. Much more lightweight and smaller, and you can connect it to external monitors if the screen size is not big enough. Also, you shouldn’t have issues running Linux on either laptops.

    Instead of going for the 16 version, I would use the extra 900-1000 euros (that’s the amount I saw I could save between the two almost maxed-out models) to make a dedicated server or mini-cluster to run your workloads. Deploy Kubernetes or Proxmox on it, and you’ll also get some more practice on it outside work if you want to run stuff for your home lab. That is only if you don’t want to game on your laptop, but I’d still put that money aside to make a desktop.






  • I’ve looked into this before, and it really depends on the type of RFID they use. Older versions have been cracked, but newer ones can’t be copied over (easily or at all).

    If your company is serious about security, you will not be able to put the content of the card on your phone. What newer, more secure versions of RFID do is receive a code from the reader system, replies to it internally, and then sends back the answer. Even if you try to copy this over, you will not be able to open the doors of your facility.

    I think the first step should be to use one of these apps that can read RFID and see what protocol your card uses. If it’s an unsecure one (i.e., only pushes out a code and checks it in their database that it’s yours), you could probably try to copy it over. However, if it’s not, you could also just dissolve the card with some acetone and place the resulting wires in your phone’s case, near the bottom. Like that, it shouldn’t interfere with your phone’s NFC, as that one is usually next to the top area of your phone.