Easy Model Import - Downloaded
Now lets pick a model to download and test out. We are going to use luna-ai-llama2-uncensored.Q4_0.gguf
, there are a few ways to do this,
Now lets download and move the model.
Using that link download the luna-ai-llama2-uncensored.ggmlv3.q5_K_M.bin
model, once done, move the model.bin into the models folder.
Yes I know haha - Luna Midori
making a how to using the luna-ai-llama2
model - lol
Now lets make 3 files into the models folder.
touch lunademo-chat.tmpl
touch lunademo-completion.tmpl
touch lunademo.yaml
Please note the names for later!
In the "lunademo-chat.tmpl"
file add
{{.Input}}
ASSISTANT:
In the "lunademo-completion.tmpl"
file add
Complete the following sentence: {{.Input}}
In the "lunademo.yaml"
file (If you want to see advanced yaml configs - Link)
backend: llama
context_size: 2000
f16: true ## If you are using cpu set this to false
gpu_layers: 4
name: lunademo
parameters:
model: luna-ai-llama2-uncensored.Q4_0.gguf
temperature: 0.2
top_k: 40
top_p: 0.65
roles:
assistant: 'ASSISTANT:'
system: 'SYSTEM:'
user: 'USER:'
template:
chat: lunademo-chat
completion: lunademo-completion
Now that we have that fully set up, we need to reboot the docker. Go back to the localai folder and run
docker-compose restart
Now that we got that setup, lets test it out but sending a request by using Curl Or use the Openai Python API!