Will's Blog: Running thousands of LLMs on one GPU is now possible with S-LoRA

Title:Running thousands of LLMs on one GPU is now possible with S-LoRA Summary: It allows a user to be served with a personalized adapter while enhancing the LLM's response by adding recent data as context. Link: Running thousands of LLMs on one GPU is now possible with S-LoRA