Getting started with AI (Artificial Intelligence) in Linux / Ubuntu using by deploying LLM (Language Learing Models) using Ollama LLMA

In this quick tutorial we will deploy various AI models and compare the differences.  We will be using smaller models so that most users should be able to follow this guide as long as they have at least 16G of memory and no other applications using much memory.

In this tutorial we recommend that you use Docker to deploy ollama.  If you are not familiar with Docker, you can check our Docker Tutorial guide here.

Step 1.) Deploy

I recommend having a Ubuntu or Debian host for this, but it is supported in Windows and Mac too.

Visit the Ollama website to download it.

Step 2.) Install Ollama

Install curl if you don't have it already: apt update && apt install curl

curl -fsSL https://ollama.com/install.sh | sh

 >>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
WARNING: Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.

Step 3.) Start ollama server

ollama serve&

Make sure you use the & so it puts the service in the background, yet it will still output to the console.  This is very useful for debugging and troubleshooting performance issues.


2025/04/22 21:00:21 routes.go:1231: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-04-22T21:00:21.697Z level=INFO source=images.go:458 msg="total blobs: 0"
time=2025-04-22T21:00:21.697Z level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-04-22T21:00:21.697Z level=INFO source=routes.go:1298 msg="Listening on 127.0.0.1:11434 (version 0.6.5)"
time=2025-04-22T21:00:21.697Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-04-22T21:00:21.727Z level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-04-22T21:00:21.727Z level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="267.6 GiB" available="255.5 GiB"

 

Step 4.) Run your first llm!

Find the LLM you want to test on the ollama website.

Keep in mind that the larger the model, the more memory and more disk space and resources it consumes when running.  For this reason, our example will use a smaller model.

ollama run qwen2.5:0.5b

Note that this smaller 0.5b model uses just about 397MB

You should see similar output as below and then have a chat prompt at the end.

 ollama run qwen2.5:0.5b
[GIN] 2025/04/22 - 21:04:44 | 200 |      79.024µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/04/22 - 21:04:44 | 404 |     455.218µs |       127.0.0.1 | POST     "/api/show"
pulling manifest ⠇ time=2025-04-22T21:04:45.299Z level=INFO source=download.go:177 msg="downloading c5396e06af29 in 4 100 MB part(s)"
pulling manifest
pulling manifest
pulling c5396e06af29... 100% ▕██████████████████████████████████████████████████████████████████████████████▏ 397 MB                         tpulling manifest
pulling c5396e06af29... 100% ▕██████████████████████████████████████████████████████████████████████████████▏ 397 MB                         
pulling manifest
pulling c5396e06af29... 100% ▕██████████████████████████████████████████████████████████████████████████████▏ 397 MB                         
pulling manifest
pulling c5396e06af29... 100% ▕██████████████████████████████████████████████████████████████████████████████▏ 397 MB                         
pulling manifest
pulling manifest
pulling c5396e06af29... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 397 MB                         
pulling 66b9ea09bd5b... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   68 B                         
pulling eb4402837c78... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB                         
pulling 832dd9e00a68... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  11 KB                         
pulling 005f95c74751... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  490 B                         
verifying sha256 digest
writing manifest
success
[GIN] 2025/04/22 - 21:04:57 | 200 |   66.099807ms |       127.0.0.1 | POST     "/api/show"
⠙ time=2025-04-22T21:04:57.759Z level=INFO source=server.go:105 msg="system memory" total="267.6 GiB" free="255.5 GiB" free_swap="976.0 MiB"
time=2025-04-22T21:04:57.759Z level=WARN source=ggml.go:152 msg="key not found" key=qwen2.vision.block_count default=0
time=2025-04-22T21:04:57.759Z level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.key_length default=64
time=2025-04-22T21:04:57.759Z level=WARN source=ggml.go:152 msg="key not found" key=qwen2.attention.value_length default=64
time=2025-04-22T21:04:57.759Z level=INFO source=server.go:138 msg=offload library=cpu layers.requested=-1 layers.model=25 layers.offload=0 layers.split="" memory.available="[255.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="782.6 MiB" memory.required.partial="0 B" memory.required.kv="96.0 MiB" memory.required.allocations="[782.6 MiB]" memory.weights.total="373.7 MiB" memory.weights.repeating="235.8 MiB" memory.weights.nonrepeating="137.9 MiB" memory.graph.full="298.5 MiB" memory.graph.partial="405.0 MiB"
llama_model_loader: loaded meta data with 34 key-value pairs and 290 tensors from /root/.ollama/models/blobs/sha256-c5396e06af294bd101b30dce59131a76d2b773e76950acc870eda801d3ab0515 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 0.5B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 0.5B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-0...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 0.5B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-0.5B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 24
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
⠹ llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q5_0:  132 tensors
llama_model_loader: - type q8_0:   13 tensors
llama_model_loader: - type q4_K:   12 tensors
llama_model_loader: - type q6_K:   12 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 373.71 MiB (6.35 BPW)
⠼ load: special tokens cache size = 22
⠴ load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 494.03 M
print_info: general.name     = Qwen2.5 0.5B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 151936
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-04-22T21:04:58.219Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /root/.ollama/models/blobs/sha256-c5396e06af294bd101b30dce59131a76d2b773e76950acc870eda801d3ab0515 --ctx-size 8192 --batch-size 512 --threads 12 --no-mmap --parallel 4 --port 38569"
time=2025-04-22T21:04:58.219Z level=INFO source=sched.go:451 msg="loaded runners" count=1
time=2025-04-22T21:04:58.219Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
time=2025-04-22T21:04:58.220Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2025-04-22T21:04:58.241Z level=INFO source=runner.go:853 msg="starting go runner"
time=2025-04-22T21:04:58.243Z level=INFO source=ggml.go:109 msg=system CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-04-22T21:04:58.250Z level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:38569"
⠦ llama_model_loader: loaded meta data with 34 key-value pairs and 290 tensors from /root/.ollama/models/blobs/sha256-c5396e06af294bd101b30dce59131a76d2b773e76950acc870eda801d3ab0515 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 0.5B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 0.5B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-0...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 0.5B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-0.5B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 24
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
⠧ llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q5_0:  132 tensors
llama_model_loader: - type q8_0:   13 tensors
llama_model_loader: - type q4_K:   12 tensors
llama_model_loader: - type q6_K:   12 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 373.71 MiB (6.35 BPW)
⠇ time=2025-04-22T21:04:58.472Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
⠏ load: special tokens cache size = 22
⠋ load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 0
print_info: n_ctx_train      = 32768
print_info: n_embd           = 896
print_info: n_layer          = 24
print_info: n_head           = 14
print_info: n_head_kv        = 2
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 7
print_info: n_embd_k_gqa     = 128
print_info: n_embd_v_gqa     = 128
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: n_ff             = 4864
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 32768
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 1B
print_info: model params     = 494.03 M
print_info: general.name     = Qwen2.5 0.5B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 151936
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors:          CPU model buffer size =   373.71 MiB
⠹ llama_init_from_model: n_seq_max     = 4
llama_init_from_model: n_ctx         = 8192
llama_init_from_model: n_ctx_per_seq = 2048
llama_init_from_model: n_batch       = 2048
llama_init_from_model: n_ubatch      = 512
llama_init_from_model: flash_attn    = 0
llama_init_from_model: freq_base     = 1000000.0
llama_init_from_model: freq_scale    = 1
llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1
llama_kv_cache_init:        CPU KV buffer size =    96.00 MiB
llama_init_from_model: KV self size  =   96.00 MiB, K (f16):   48.00 MiB, V (f16):   48.00 MiB
llama_init_from_model:        CPU  output buffer size =     2.33 MiB
⠸ llama_init_from_model:        CPU compute buffer size =   300.25 MiB
llama_init_from_model: graph nodes  = 846
llama_init_from_model: graph splits = 1
time=2025-04-22T21:04:58.973Z level=INFO source=server.go:619 msg="llama runner started in 0.75 seconds"
[GIN] 2025/04/22 - 21:04:58 | 200 |  1.319178734s |       127.0.0.1 | POST     "/api/generate"
>>> Send a message (/? for help)

Checking the ollama process we can see it uses about

1589132 root      20   0 2874.7m 553.7m  21.5m S  1140   0.2   3:03.60 ollama

553.7MB of RAM


Tags:

ai, artificial, linux, ubuntu, deploying, llm, learnng, models, ollama, llmain, tutorial, deploy, various, differences, users, applications, docker, container,

Latest Articles

  • Bad Power Supply Issue Story Diagnosing Troubleshooting
  • Getting started with AI (Artificial Intelligence) in Linux / Ubuntu using by deploying LLM (Language Learing Models) using Ollama LLMA
  • microk8s kubernetes how to install OpenEBS
  • Flash LSI MegaRAID 2208 to IT mode in Linux Mint/Debian/Ubuntu
  • LSI MegaRAID in Linux Ubuntu / Centos Tutorial Setup Guide megacli
  • Convert-im6.q16: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/413. convert-im6.q16: no images defined `pts-time.jpg' @ error/convert.c/ConvertImageCommand/3258. solution ImageMagick P
  • Apache PHP sending expires header solution cannot use cache with CDN
  • How to install virt-manager in Mint 22/Ubuntu 22
  • Infiniband Guide
  • python mysql install error: /bin/sh: 1: mysql_config: not found /bin/sh: 1: mariadb_config: not found /bin/sh: 1: mysql_config: not found mysql_config --version
  • FreePBX 17 How To Add a Trunk
  • Docker Container Onboot Policy - How to make sure a container is always running
  • FreePBX 17 How To Add Phones / Extensions and Register
  • Warning: The driver descriptor says the physical block size is 2048 bytes, but Linux says it is 512 bytes. solution
  • Cisco How To Use a Third Party SIP Phone (eg. Avaya, 3CX)
  • Cisco Unified Communication Manager (CUCM) - How To Add Phones
  • pptp / pptpd not working in DD-WRT iptables / router
  • systemd-journald high memory usage solution
  • How to Install FreePBX 17 in Linux Debian Ubuntu Mint Guide
  • How To Install Cisco's CUCM (Cisco Unified Communication Manager) 12 Guide