Welcome Guest ( Log In | Register )

5 Pages < 1 2 3 4 5 >Bottom

Outline · [ Standard ] · Linear+

 Ollama - Offline Generative AI, Similar to ChatGPT

views
     
TSxxboxx
post May 26 2025, 08:41 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,258 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 26 2025, 08:17 PM)
Lately gemma3 update for 12b are annoying, they distribute it to my cpu & gpu, just won't run at 100% GPU anymore.
CODE

ollama ps
NAME                 ID              SIZE     PROCESSOR          UNTIL
gemma3:12b-it-qat    5d4fa005e7bb    12 GB    31%/69% CPU/GPU    4 minutes from now

*
How many GB is your VRAM? Windows also uses some VRAM, if 12GB then not enough. When not enough VRAM then it will offload some to CPU and causes the slowdown.
c2tony
post May 26 2025, 09:48 PM

Getting Started
**
Junior Member
83 posts

Joined: Jan 2003
From: BM, Butterworth, Penang island.


QUOTE(xxboxx @ May 26 2025, 08:41 PM)
How many GB is your VRAM? Windows also uses some VRAM, if 12GB then not enough. When not enough VRAM then it will offload some to CPU and causes the slowdown.
*
12GB vram, i know...
Qwen3 with 14b and 9.3GB uses my 100% GPU, for a comparison

CODE
ollama ps
NAME         ID              SIZE     PROCESSOR    UNTIL
qwen3:14b    7d7da67570e2    10 GB    100% GPU     4 minutes from now


CODE
NAME                          ID              SIZE      MODIFIED    
gemma3:12b-it-qat             5d4fa005e7bb    8.9 GB    2 weeks ago
qwen3:14b                     7d7da67570e2    9.3 GB    3 weeks ago


This post has been edited by c2tony: May 26 2025, 10:09 PM
ipohps3
post May 26 2025, 09:55 PM

Regular
******
Senior Member
1,974 posts

Joined: Dec 2011


donno about you guys.

i was enthusiastic about open models earlier this year with DeepSeek in Jan and the following months with other open models being released also.

however, since last month and this month with Google Gemini 2.5 released, don't think I would want to go back using open models since Gemini+DeepMind is getting extremely good at almost all things and none of the open models that can run with RTX3090 can come close to it.

after sometime, paying the 20usd per month is more productive for me to get things done than using open models.

This post has been edited by ipohps3: May 26 2025, 09:56 PM
c2tony
post May 26 2025, 10:08 PM

Getting Started
**
Junior Member
83 posts

Joined: Jan 2003
From: BM, Butterworth, Penang island.


QUOTE(ipohps3 @ May 26 2025, 09:55 PM)
donno about you guys.

i was enthusiastic about open models earlier this year with DeepSeek in Jan and the following months with other open models being released also.

however, since last month and this month with Google Gemini 2.5 released, don't think I would want to go back using open models since Gemini is getting extremely good at almost all things and none of the open models that can run with RTX3090 can come close to it.

after sometime, paying the 20usd per month is more productive for me to get things done than using open models.
*
Yeah, you paid. That's the whole point! "one-off" or "batch" processing are best when you pay. I wouldn't pay $20 for my use case.

Gemini is a closed system, you don’t get to tweak it, audit its training data, or run it locally.
For some users, this trade-off is worth it.
For others, it’s not. Not to mention there's a limit for API can't be use at offline
TSxxboxx
post May 26 2025, 11:39 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,258 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 26 2025, 09:48 PM)
12GB vram, i know...
Qwen3 with 14b and 9.3GB uses my 100% GPU, for a comparison

CODE
ollama ps
NAME         ID              SIZE     PROCESSOR    UNTIL
qwen3:14b    7d7da67570e2    10 GB    100% GPU     4 minutes from now


CODE
NAME                          ID              SIZE      MODIFIED    
gemma3:12b-it-qat             5d4fa005e7bb    8.9 GB    2 weeks ago
qwen3:14b                     7d7da67570e2    9.3 GB    3 weeks ago

*
There you go, not enough VRAM.
Why your gemma3:12b-it-qat is 12GB? I see ollama page it is only 8.9GB

QUOTE(ipohps3 @ May 26 2025, 09:55 PM)
donno about you guys.

i was enthusiastic about open models earlier this year with DeepSeek in Jan and the following months with other open models being released also.

however, since last month and this month with Google Gemini 2.5 released, don't think I would want to go back using open models since Gemini+DeepMind is getting extremely good at almost all things and none of the open models that can run with RTX3090 can come close to it.

after sometime, paying the 20usd per month is more productive for me to get things done than using open models.
*
Gemini indeed has got a lot better, also ChatGPT. For me just using it for fun, I didn't pay for the more capable model. Maybe that's why I feel the free model is still less capable than open source model. Question such as this Gemini 2.5 Pro still got it wrong

user posted image
ipohps3
post May 27 2025, 01:27 AM

Regular
******
Senior Member
1,974 posts

Joined: Dec 2011


QUOTE(xxboxx @ May 26 2025, 11:39 PM)
There you go, not enough VRAM.
Why your gemma3:12b-it-qat is 12GB? I see ollama page it is only 8.9GB
Gemini indeed has got a lot better, also ChatGPT. For me just using it for fun, I didn't pay for the more capable model. Maybe that's why I feel the free model is still less capable than open source model. Question such as this Gemini 2.5 Pro still got it wrong

user posted image
*
yeah. sometimes it get the basic wrong. i tried on ChatGPT seems can get it right. but anyway i don't use it for this trivial stuff. i mainly use the YouTube video analysis, deep research, audio overview podcast, and canvas features for coding and research on new topics purposes. main thing is its large 1M context window which no one can support it locally at home even if you have open model that support 1M context window.

This post has been edited by ipohps3: May 27 2025, 01:28 AM
TSxxboxx
post May 27 2025, 10:13 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,258 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(ipohps3 @ May 27 2025, 01:27 AM)
yeah. sometimes it get the basic wrong. i tried on ChatGPT seems can get it right. but anyway i don't use it for this trivial stuff. i mainly use the YouTube video analysis, deep research, audio overview podcast, and canvas features for coding and research on new topics purposes. main thing is its large 1M context window which no one can support it locally at home even if you have open model that support 1M context window.
*
For suggesting new ideas or perspective LLM is useful but when it's analysis and research I find LLM missed what is important for me and in the end I still have to do the analysis and research by myself.

Mac Studio with 512GB RAM can handle 1M context window or more, but the price sweat.gif
Maybe the upcoming Intel GPU for AI will solve the memory bottleneck issue.
c2tony
post May 27 2025, 11:21 PM

Getting Started
**
Junior Member
83 posts

Joined: Jan 2003
From: BM, Butterworth, Penang island.


QUOTE(xxboxx @ May 26 2025, 11:39 PM)
There you go, not enough VRAM.
Why your gemma3:12b-it-qat is 12GB? I see ollama page it is only 8.9GB


That's why I say after update, a 12b model of 8.9GB becomes 12GB when you actually runs it.

Gemini respond: It gets de-quantized or processed in a higher precision during runtime. The architecture, the specific runtime precision of activation and KV cache, and the optimizations applied by the inference framework
Gemma 3's multimodal nature and potentially different KV cache handling seem to be key contributors to its higher observed runtime memory usage compared to Qwen 2 14B models of similar quantization.

this mean time for a higher vram gpu or upgrade to npu
TSxxboxx
post May 28 2025, 11:48 AM

The mind is for having ideas, not holding them
*******
Senior Member
5,258 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 27 2025, 11:21 PM)
That's why I say after update, a 12b model of 8.9GB becomes 12GB when you actually runs it.

Gemini respond:  It gets de-quantized or processed in a higher precision during runtime. The architecture, the specific runtime precision of activation and KV cache, and the optimizations applied by the inference framework
Gemma 3's multimodal nature and potentially different KV cache handling seem to be key contributors to its higher observed runtime memory usage compared to Qwen 2 14B models of similar quantization.

this mean time for a higher vram gpu or upgrade to npu
*
I see what you mean now. gemma3:latest is 3.3GB but when runs is using 6.0GB
Maybe ollama added more functions and that also keep increasing the memory usage.
I saw this, just having more context length will increase a lot of memory usage
https://github.com/open-webui/open-webui/discussions/8303
TSxxboxx
post May 28 2025, 12:13 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,258 posts

Joined: Oct 2004
From: J@Y B33


I think because with v0.7.0 new engine, ollama added support for multimodal and this increase the memory usage significantly
QUOTE
Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models:

Meta Llama 4
Google Gemma 3
Qwen 2.5 VL
Mistral Small 3.1
and more vision models.


I checked few model size vs actual loaded
phi4:latest 9.1GB become 10GB
deepseek-r1:14b 9.0GB become 10GB
MiMo-7B-RL-GGUF:Q8_0 8.1GB become 9.6GB
gemma3:12b 8.1GB become 11GB
gemma3:latest 3.3Gb become 6.0GB
llama3.2:latest 2.0Gb become 4.0GB
granite3.2-vision:2b-fp16 6.0Gb become 8.8GB

Model that support vision the size increase a lot more than model without vision
TSxxboxx
post May 28 2025, 12:39 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,258 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(xxboxx @ Oct 26 2024, 07:25 PM)
chow1942 using minicpm-v can you get all the text from this image correctly?
user posted image

I only got as such using it
user posted image

But using one of the online server it got very close to complete and correct
user posted image

I wonder if my parameters is not correct or ollama/open-webui engine issue.

using llama 3.2 vision on one of the online server also give it correctly but then probably runs out of token
user posted image

I also tried on ChatGPT and everything is almost correct
user posted image
*
Ollama's vision now after update is a lot better than few months ago

Using gemma3:12b there is some wrong data but a lot better than previously
user posted image

Qwen 2.5 recently also update for it's vision model and it is more accurate than gemma3 even though only at 7b vs 12b
user posted image

Even with bigger picture that have more data it still can get most thing right
user posted image
c2tony
post May 30 2025, 10:12 PM

Getting Started
**
Junior Member
83 posts

Joined: Jan 2003
From: BM, Butterworth, Penang island.


QUOTE(xxboxx @ May 28 2025, 12:39 PM)
Ollama's vision now after update is a lot better than few months ago

Using gemma3:12b there is some wrong data but a lot better than previously
» Click to show Spoiler - click again to hide... «

*
Geez... You must using a lot of OCR? tongue.gif

They gets better and better with a lot of added rules and regulations, uncensored wild west are disappearing lol.gif
ipohps3
post May 30 2025, 10:19 PM

Regular
******
Senior Member
1,974 posts

Joined: Dec 2011


anyone tried Gemma 3n 4B ?
TSxxboxx
post May 30 2025, 10:28 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,258 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 30 2025, 10:12 PM)
Geez... You must using a lot of OCR?  tongue.gif

They gets better and better with a lot of added rules and regulations, uncensored wild west are disappearing  lol.gif
*
Yup, mainly for hand written on such table format. Other app OCR such as Snipping Tool or OneNote can't correctly recognize all the text or it can't preserve the table format. Been using ChatGPT before this but after few photos already reached daily caps for free user. I then use Gemini but it is not as accurate as ChatGPT. Now can rely on ollama to do OCR.

Pros and cons. But the penalty on memory usage is hardest to swallow. Hopefully in future they can reduce back the memory usage.

This post has been edited by xxboxx: May 30 2025, 10:32 PM
c2tony
post May 30 2025, 10:29 PM

Getting Started
**
Junior Member
83 posts

Joined: Jan 2003
From: BM, Butterworth, Penang island.


QUOTE(ipohps3 @ May 30 2025, 10:19 PM)
anyone tried Gemma 3n 4B ?
*
what platform did you use on mobile or edge side? I installed PocketPal with llama-3.2-1b-instruct only, mobile have lots of distraction grin.gif
TSxxboxx
post May 30 2025, 10:31 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,258 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(ipohps3 @ May 30 2025, 10:19 PM)
anyone tried Gemma 3n 4B ?
*
Gemma3:4B?

I tried it, less capable than 12B model
c2tony
post May 30 2025, 10:39 PM

Getting Started
**
Junior Member
83 posts

Joined: Jan 2003
From: BM, Butterworth, Penang island.


QUOTE(xxboxx @ May 30 2025, 10:31 PM)
Gemma3:4B?

I tried it, less capable than 12B model
*
I think he mean this:

user posted image
TSxxboxx
post May 31 2025, 08:35 AM

The mind is for having ideas, not holding them
*******
Senior Member
5,258 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 30 2025, 10:39 PM)
I think he mean this:

user posted image
*
Oh no wonder I didn't saw it, I only checked at ollama website and there haven't got yet
c2tony
post Jun 4 2025, 11:08 PM

Getting Started
**
Junior Member
83 posts

Joined: Jan 2003
From: BM, Butterworth, Penang island.


QUOTE(ipohps3 @ May 30 2025, 10:19 PM)
anyone tried Gemma 3n 4B ?
*
QUOTE(xxboxx @ May 31 2025, 08:35 AM)
Oh no wonder I didn't saw it, I only checked at ollama website and there haven't got yet
*
download https://github.com/google-ai-edge/gallery/releases/tag/1.0.3 and tried Gemma-3n-E4B-it-int4 at my phone today.
My Honor Magic 6 pro turn into hand warmer , 3.51 tokens/s
Lower if multitasking and I don't have the patient so i just close it tongue.gif

there's a youtuber talking about it
https://youtu.be/Vb8L5mtjLDo?si=fxp9nddnJ8zsuO08


ipohps3
post Jun 4 2025, 11:15 PM

Regular
******
Senior Member
1,974 posts

Joined: Dec 2011


anyone tried the DeepSeek R1 0528 Qwen distilled version?

how is it?

5 Pages < 1 2 3 4 5 >Top
 

Change to:
| Lo-Fi Version
0.0143sec    0.62    5 queries    GZIP Disabled
Time is now: 28th November 2025 - 12:41 AM