Easy Model Import - Gallery

Now lets pick a model to download and test out. We are going to use luna-ai-llama2-uncensored.Q4_0.gguf, there are a few ways to do this, https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF/resolve/main/luna-ai-llama2-uncensored.Q4_0.gguf

The below command requires the Docker container already running, and uses the Model Gallery to download the model. it may also set up a model YAML config file, but we will need to override that for this how to setup!

curl --location 'http://localhost:8080/models/apply' \
--header 'Content-Type: application/json' \
--data-raw '{
    "id": "TheBloke/Luna-AI-Llama2-Uncensored-GGUF/luna-ai-llama2-uncensored.Q4_0.gguf",
    "name": "lunademo"
}'

Yes I know haha - Luna Midori making a how to using the luna-ai-llama2 model - lol

Note

You will need to delete the following 3 files that hugging face downloaded…

chat.tmpl
completion.tmpl
lunademo.yaml

Now lets make 3 files in the models folder.

touch lunademo-chat.tmpl
touch lunademo-completion.tmpl
touch lunademo.yaml

Please note the names for later!

In the "lunademo-chat.tmpl" file add

{{.Input}}

ASSISTANT:

In the "lunademo-completion.tmpl" file add

Complete the following sentence: {{.Input}}

In the "lunademo.yaml" file (If you want to see advanced yaml configs - Link)

backend: llama
context_size: 2000
f16: true ## If you are using cpu set this to false
gpu_layers: 4
name: lunademo
parameters:
  model: luna-ai-llama2-uncensored.Q4_0.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.65
roles:
  assistant: 'ASSISTANT:'
  system: 'SYSTEM:'
  user: 'USER:'
template:
  chat: lunademo-chat
  completion: lunademo-completion

Now that we have that fully set up, we need to reboot the Docker container. Go back to the localai folder and run

docker-compose restart

Now that we got that setup, lets test it out but sending a request by using Curl Or use the Openai Python API!