Subsections of How-tos

Easy Model Import - Downloaded

Now lets pick a model to download and test out. We are going to use luna-ai-llama2-uncensored.Q4_0.gguf, there are a few ways to do this,

Now lets download and move the model.

Link - https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF/resolve/main/luna-ai-llama2-uncensored.Q4_0.gguf

Using that link download the luna-ai-llama2-uncensored.ggmlv3.q5_K_M.bin model, once done, move the model.bin into the models folder.

Yes I know haha - Luna Midori making a how to using the luna-ai-llama2 model - lol

Now lets make 3 files into the models folder.

touch lunademo-chat.tmpl
touch lunademo-completion.tmpl
touch lunademo.yaml

Please note the names for later!

In the "lunademo-chat.tmpl" file add

{{.Input}}

ASSISTANT:

In the "lunademo-completion.tmpl" file add

Complete the following sentence: {{.Input}}

In the "lunademo.yaml" file (If you want to see advanced yaml configs - Link)

backend: llama
context_size: 2000
f16: true ## If you are using cpu set this to false
gpu_layers: 4
name: lunademo
parameters:
  model: luna-ai-llama2-uncensored.Q4_0.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.65
roles:
  assistant: 'ASSISTANT:'
  system: 'SYSTEM:'
  user: 'USER:'
template:
  chat: lunademo-chat
  completion: lunademo-completion

Now that we have that fully set up, we need to reboot the docker. Go back to the localai folder and run

docker-compose restart

Now that we got that setup, lets test it out but sending a request by using Curl Or use the Openai Python API!

Easy Model Import - Gallery

Now lets pick a model to download and test out. We are going to use luna-ai-llama2-uncensored.Q4_0.gguf, there are a few ways to do this, https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF/resolve/main/luna-ai-llama2-uncensored.Q4_0.gguf

The below command requires the Docker container already running, and uses the Model Gallery to download the model. it may also set up a model YAML config file, but we will need to override that for this how to setup!

curl --location 'http://localhost:8080/models/apply' \
--header 'Content-Type: application/json' \
--data-raw '{
    "id": "TheBloke/Luna-AI-Llama2-Uncensored-GGUF/luna-ai-llama2-uncensored.Q4_0.gguf",
    "name": "lunademo"
}'

Yes I know haha - Luna Midori making a how to using the luna-ai-llama2 model - lol

Note

You will need to delete the following 3 files that hugging face downloaded…

  • chat.tmpl
  • completion.tmpl
  • lunademo.yaml

Now lets make 3 files in the models folder.

touch lunademo-chat.tmpl
touch lunademo-completion.tmpl
touch lunademo.yaml

Please note the names for later!

In the "lunademo-chat.tmpl" file add

{{.Input}}

ASSISTANT:

In the "lunademo-completion.tmpl" file add

Complete the following sentence: {{.Input}}

In the "lunademo.yaml" file (If you want to see advanced yaml configs - Link)

backend: llama
context_size: 2000
f16: true ## If you are using cpu set this to false
gpu_layers: 4
name: lunademo
parameters:
  model: luna-ai-llama2-uncensored.Q4_0.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.65
roles:
  assistant: 'ASSISTANT:'
  system: 'SYSTEM:'
  user: 'USER:'
template:
  chat: lunademo-chat
  completion: lunademo-completion

Now that we have that fully set up, we need to reboot the Docker container. Go back to the localai folder and run

docker-compose restart

Now that we got that setup, lets test it out but sending a request by using Curl Or use the Openai Python API!

Easy Request - Curl

Now we can make a curl request!

Curl Chat API -

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "lunademo",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'

Curl Completion API -

curl --request POST \
  --url http://localhost:8080/v1/completions \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "lunademo",
    "prompt": "function downloadFile(string url, string outputPath) {",
    "max_tokens": 256,
    "temperature": 0.5
}'

See OpenAI API for more info! Have fun using LocalAI!

Easy Request - Openai

Now we can make a openai request!

OpenAI Chat API Python -

import os
import openai
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "sx-xxx"
OPENAI_API_KEY = "sx-xxx"
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

completion = openai.ChatCompletion.create(
  model="lunademo",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "How are you?"}
  ]
)

print(completion.choices[0].message.content)

OpenAI Completion API Python -

import os
import openai
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "sx-xxx"
OPENAI_API_KEY = "sx-xxx"
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

completion = openai.Completion.create(
  model="lunademo",
  prompt="function downloadFile(string url, string outputPath) ",
  max_tokens=256,
  temperature=0.5)

print(completion.choices[0].text)

See OpenAI API for more info! Have fun using LocalAI!

Easy Setup - CPU Docker

We are going to run LocalAI with docker-compose for this set up.

Lets clone LocalAI with git.

git clone https://github.com/go-skynet/LocalAI

Then we will cd into the LocalAI folder.

cd LocalAI

At this point we want to set up our .env file, here is a copy for you to use if you wish, please make sure to set it to the same as in the docker-compose file for later.

## Set number of threads.
## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.
THREADS=2

## Specify a different bind address (defaults to ":8080")
# ADDRESS=127.0.0.1:8080

## Default models context size
# CONTEXT_SIZE=512
#
## Define galleries.
## models will to install will be visible in `/models/available`
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]

## CORS settings
# CORS=true
# CORS_ALLOW_ORIGINS=*

## Default path for models
#
MODELS_PATH=/models

## Enable debug mode
DEBUG=true

## Specify a build type. Available: cublas, openblas, clblas.
# Do not uncomment this as we are using CPU:
# BUILD_TYPE=cublas

## Uncomment and set to true to enable rebuilding from source
REBUILD=true

## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## tts: enables text-to-speech with go-piper 
## (requires REBUILD=true)
#
#GO_TAGS=tts

## Path where to store generated images
# IMAGE_PATH=/tmp

## Specify a default upload limit in MB (whisper)
# UPLOAD_LIMIT
# HUGGINGFACEHUB_API_TOKEN=Token here

Now that we have the .env set lets set up our docker-compose file. It will use a container from quay.io. Also note this docker-compose file is for CPU only.

version: '3.6'

services:
  api:
    image: quay.io/go-skynet/local-ai:master
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
    command: ["/usr/bin/local-ai" ]

Make sure to save that in the root of the LocalAI folder. Then lets spin up the Docker run this in a CMD or BASH

docker-compose up -d --pull always

Now we are going to let that set up, once it is done, lets check to make sure our huggingface / localai galleries are working (wait until you see this screen to do this)

You should see:

┌───────────────────────────────────────────────────┐
│                   Fiber v2.42.0                   │
│               http://127.0.0.1:8080               │
│       (bound on host 0.0.0.0 and port 8080)       │
│                                                   │
│ Handlers ............. 1  Processes ........... 1 │
│ Prefork ....... Disabled  PID ................. 1 │
└───────────────────────────────────────────────────┘
curl http://localhost:8080/models/available

Output will look like this:

Now that we got that setup, lets go download a model by Downloading It Or use the Gallery!

Easy Setup - Demo

This is for Linux, Mac OS, or Windows Hosts. - Docker Desktop, Python 3.11, Git

Linux Hosts:

There is a Full_Auto installer compatible with some types of Linux distributions, feel free to use them, but note that they may not fully work. If you need to install something, please use the links at the top.

git clone https://github.com/lunamidori5/localai-lunademo.git

cd localai-lunademo

#Pick your type of linux for the Full Autos, if you already have python, docker, and docker-compose installed skip this chmod. But make sure you chmod the setup_linux file.

chmod +x Full_Auto_setup_Debian.sh or chmod +x Full_Auto_setup_Ubutnu.sh

chmod +x Setup_Linux.sh

#Make sure to install cuda to your host OS and to Docker if you plan on using GPU

./(the setupfile you wish to run)

Windows Hosts:

REM Make sure you have git, docker-desktop, and python 3.11 installed

git clone https://github.com/lunamidori5/localai-lunademo.git

cd localai-lunademo

call Setup.bat

MacOS Hosts:

  • I need some help working on a MacOS Setup file, if you are willing to help out, please contact Luna Midori on discord or put in a PR on Luna Midori’s github.

Video How Tos

  • Ubuntu - COMING SOON
  • Debian - COMING SOON
  • Windows - COMING SOON
  • MacOS - PLANED - NEED HELP

Enjoy localai! (If you need help contact Luna Midori on Discord)

  • Trying to run Setup.bat or Setup_Linux.sh from Git Bash on Windows is not working.
  • Running over SSH or other remote command line based apps may bug out, load slowly, or crash.
  • There seems to be a bug with docker-compose not running. (Main.py workaround added)

Easy Setup - Embeddings

To install an embedding model, run the following command

curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
     "id": "model-gallery@bert-embeddings"
   }'  

Now we need to make a bert.yaml in the models folder

backend: bert-embeddings
embeddings: true
name: text-embedding-ada-002
parameters:
  model: bert

Restart LocalAI after you change a yaml file

When you would like to request the model from CLI you can do

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "text-embedding-ada-002"
  }'

See OpenAI Embedding for more info!

Easy Setup - GPU Docker

We are going to run LocalAI with docker-compose for this set up.

Lets clone LocalAI with git.

git clone https://github.com/go-skynet/LocalAI

Then we will cd into the LocalAI folder.

cd LocalAI

At this point we want to set up our .env file, here is a copy for you to use if you wish, please make sure to set it to the same as in the docker-compose file for later.

## Set number of threads.
## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.
THREADS=2

## Specify a different bind address (defaults to ":8080")
# ADDRESS=127.0.0.1:8080

## Default models context size
# CONTEXT_SIZE=512
#
## Define galleries.
## models will to install will be visible in `/models/available`
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]

## CORS settings
# CORS=true
# CORS_ALLOW_ORIGINS=*

## Default path for models
#
MODELS_PATH=/models

## Enable debug mode
DEBUG=true

## Specify a build type. Available: cublas, openblas, clblas.
BUILD_TYPE=cublas

## Uncomment and set to true to enable rebuilding from source
REBUILD=true

## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## tts: enables text-to-speech with go-piper 
## (requires REBUILD=true)
#
#GO_TAGS=tts

## Path where to store generated images
# IMAGE_PATH=/tmp

## Specify a default upload limit in MB (whisper)
# UPLOAD_LIMIT
# HUGGINGFACEHUB_API_TOKEN=Token here

Now that we have the .env set lets set up our docker-compose file. It will use a container from quay.io. Also note this docker-compose file is for CUDA only.

version: '3.6'

services:
  api:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    image: quay.io/go-skynet/local-ai:master-cublas-cuda12
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
    command: ["/usr/bin/local-ai" ]

Make sure to save that in the root of the LocalAI folder. Then lets spin up the Docker run this in a CMD or BASH

docker-compose up -d --pull always

Now we are going to let that set up, once it is done, lets check to make sure our huggingface / localai galleries are working (wait until you see this screen to do this)

You should see:

┌───────────────────────────────────────────────────┐
│                   Fiber v2.42.0                   │
│               http://127.0.0.1:8080               │
│       (bound on host 0.0.0.0 and port 8080)       │
│                                                   │
│ Handlers ............. 1  Processes ........... 1 │
│ Prefork ....... Disabled  PID ................. 1 │
└───────────────────────────────────────────────────┘
curl http://localhost:8080/models/available

Output will look like this:

Now that we got that setup, lets go download a model by Downloading It Or use the Gallery!

Easy Setup - Stable Diffusion

Section under construction