Ollama - Offline Generative AI

Lowyat.NET forums

Lowyat.NET Kopitiam Garage Sales

Lowyat.NET Rules and Regulations FAQ Help Search Members

Welcome Guest ( Log In | Register )

Lowyat.NET -> Software

Bump Topic Add Reply RSS Feed

5 Pages < 1 2 3 4 5 >Bottom

Outline · [ Standard ] · Linear+

Ollama - Offline Generative AI, Similar to ChatGPT

views

TSxxboxx	May 26 2025, 08:41 PM Show posts by this member only \| Post #41
The mind is for having ideas, not holding them Senior Member 5,258 posts Joined: Oct 2004 From: J@Y B33	QUOTE(c2tony @ May 26 2025, 08:17 PM) Lately gemma3 update for 12b are annoying, they distribute it to my cpu & gpu, just won't run at 100% GPU anymore. CODE ollama ps NAME ID SIZE PROCESSOR UNTIL gemma3:12b-it-qat 5d4fa005e7bb 12 GB 31%/69% CPU/GPU 4 minutes from now How many GB is your VRAM? Windows also uses some VRAM, if 12GB then not enough. When not enough VRAM then it will offload some to CPU and causes the slowdown.
Card PM	Report Top Like Quote Reply

c2tony	May 26 2025, 09:48 PM Show posts by this member only \| Post #42
Getting Started Junior Member 83 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 26 2025, 08:41 PM) How many GB is your VRAM? Windows also uses some VRAM, if 12GB then not enough. When not enough VRAM then it will offload some to CPU and causes the slowdown. 12GB vram, i know... Qwen3 with 14b and 9.3GB uses my 100% GPU, for a comparison CODE ollama ps NAME ID SIZE PROCESSOR UNTIL qwen3:14b 7d7da67570e2 10 GB 100% GPU 4 minutes from now CODE NAME ID SIZE MODIFIED gemma3:12b-it-qat 5d4fa005e7bb 8.9 GB 2 weeks ago qwen3:14b 7d7da67570e2 9.3 GB 3 weeks ago This post has been edited by c2tony: May 26 2025, 10:09 PM
Card PM	Report Top Like Quote Reply

ipohps3	May 26 2025, 09:55 PM Show posts by this member only \| Post #43
Regular Senior Member 1,974 posts Joined: Dec 2011	donno about you guys. i was enthusiastic about open models earlier this year with DeepSeek in Jan and the following months with other open models being released also. however, since last month and this month with Google Gemini 2.5 released, don't think I would want to go back using open models since Gemini+DeepMind is getting extremely good at almost all things and none of the open models that can run with RTX3090 can come close to it. after sometime, paying the 20usd per month is more productive for me to get things done than using open models. This post has been edited by ipohps3: May 26 2025, 09:56 PM
Card PM	Report Top Like Quote Reply

c2tony	May 26 2025, 10:08 PM Show posts by this member only \| Post #44
Getting Started Junior Member 83 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(ipohps3 @ May 26 2025, 09:55 PM) donno about you guys. i was enthusiastic about open models earlier this year with DeepSeek in Jan and the following months with other open models being released also. however, since last month and this month with Google Gemini 2.5 released, don't think I would want to go back using open models since Gemini is getting extremely good at almost all things and none of the open models that can run with RTX3090 can come close to it. after sometime, paying the 20usd per month is more productive for me to get things done than using open models. Yeah, you paid. That's the whole point! "one-off" or "batch" processing are best when you pay. I wouldn't pay $20 for my use case. Gemini is a closed system, you don’t get to tweak it, audit its training data, or run it locally. For some users, this trade-off is worth it. For others, it’s not. Not to mention there's a limit for API can't be use at offline
Card PM	Report Top Like Quote Reply

TSxxboxx	May 26 2025, 11:39 PM Show posts by this member only \| Post #45
The mind is for having ideas, not holding them Senior Member 5,258 posts Joined: Oct 2004 From: J@Y B33	QUOTE(c2tony @ May 26 2025, 09:48 PM) 12GB vram, i know... Qwen3 with 14b and 9.3GB uses my 100% GPU, for a comparison CODE ollama ps NAME ID SIZE PROCESSOR UNTIL qwen3:14b 7d7da67570e2 10 GB 100% GPU 4 minutes from now CODE NAME ID SIZE MODIFIED gemma3:12b-it-qat 5d4fa005e7bb 8.9 GB 2 weeks ago qwen3:14b 7d7da67570e2 9.3 GB 3 weeks ago There you go, not enough VRAM. Why your gemma3:12b-it-qat is 12GB? I see ollama page it is only 8.9GB QUOTE(ipohps3 @ May 26 2025, 09:55 PM) donno about you guys. i was enthusiastic about open models earlier this year with DeepSeek in Jan and the following months with other open models being released also. however, since last month and this month with Google Gemini 2.5 released, don't think I would want to go back using open models since Gemini+DeepMind is getting extremely good at almost all things and none of the open models that can run with RTX3090 can come close to it. after sometime, paying the 20usd per month is more productive for me to get things done than using open models. Gemini indeed has got a lot better, also ChatGPT. For me just using it for fun, I didn't pay for the more capable model. Maybe that's why I feel the free model is still less capable than open source model. Question such as this Gemini 2.5 Pro still got it wrong
Card PM	Report Top Like Quote Reply

ipohps3	May 27 2025, 01:27 AM Show posts by this member only \| Post #46
Regular Senior Member 1,974 posts Joined: Dec 2011	QUOTE(xxboxx @ May 26 2025, 11:39 PM) There you go, not enough VRAM. Why your gemma3:12b-it-qat is 12GB? I see ollama page it is only 8.9GB Gemini indeed has got a lot better, also ChatGPT. For me just using it for fun, I didn't pay for the more capable model. Maybe that's why I feel the free model is still less capable than open source model. Question such as this Gemini 2.5 Pro still got it wrong yeah. sometimes it get the basic wrong. i tried on ChatGPT seems can get it right. but anyway i don't use it for this trivial stuff. i mainly use the YouTube video analysis, deep research, audio overview podcast, and canvas features for coding and research on new topics purposes. main thing is its large 1M context window which no one can support it locally at home even if you have open model that support 1M context window. This post has been edited by ipohps3: May 27 2025, 01:28 AM
Card PM	Report Top Like Quote Reply

TSxxboxx	May 27 2025, 10:13 PM Show posts by this member only \| Post #47
The mind is for having ideas, not holding them Senior Member 5,258 posts Joined: Oct 2004 From: J@Y B33	QUOTE(ipohps3 @ May 27 2025, 01:27 AM) yeah. sometimes it get the basic wrong. i tried on ChatGPT seems can get it right. but anyway i don't use it for this trivial stuff. i mainly use the YouTube video analysis, deep research, audio overview podcast, and canvas features for coding and research on new topics purposes. main thing is its large 1M context window which no one can support it locally at home even if you have open model that support 1M context window. For suggesting new ideas or perspective LLM is useful but when it's analysis and research I find LLM missed what is important for me and in the end I still have to do the analysis and research by myself. Mac Studio with 512GB RAM can handle 1M context window or more, but the price Maybe the upcoming Intel GPU for AI will solve the memory bottleneck issue.
Card PM	Report Top Like Quote Reply

c2tony	May 27 2025, 11:21 PM Show posts by this member only \| Post #48
Getting Started Junior Member 83 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 26 2025, 11:39 PM) There you go, not enough VRAM. Why your gemma3:12b-it-qat is 12GB? I see ollama page it is only 8.9GB That's why I say after update, a 12b model of 8.9GB becomes 12GB when you actually runs it. Gemini respond: It gets de-quantized or processed in a higher precision during runtime. The architecture, the specific runtime precision of activation and KV cache, and the optimizations applied by the inference framework Gemma 3's multimodal nature and potentially different KV cache handling seem to be key contributors to its higher observed runtime memory usage compared to Qwen 2 14B models of similar quantization. this mean time for a higher vram gpu or upgrade to npu
Card PM	Report Top Like Quote Reply

TSxxboxx	May 28 2025, 11:48 AM Show posts by this member only \| Post #49
The mind is for having ideas, not holding them Senior Member 5,258 posts Joined: Oct 2004 From: J@Y B33	QUOTE(c2tony @ May 27 2025, 11:21 PM) That's why I say after update, a 12b model of 8.9GB becomes 12GB when you actually runs it. Gemini respond: It gets de-quantized or processed in a higher precision during runtime. The architecture, the specific runtime precision of activation and KV cache, and the optimizations applied by the inference framework Gemma 3's multimodal nature and potentially different KV cache handling seem to be key contributors to its higher observed runtime memory usage compared to Qwen 2 14B models of similar quantization. this mean time for a higher vram gpu or upgrade to npu I see what you mean now. gemma3:latest is 3.3GB but when runs is using 6.0GB Maybe ollama added more functions and that also keep increasing the memory usage. I saw this, just having more context length will increase a lot of memory usage https://github.com/open-webui/open-webui/discussions/8303
Card PM	Report Top Like Quote Reply

TSxxboxx	May 28 2025, 12:13 PM Show posts by this member only \| Post #50
The mind is for having ideas, not holding them Senior Member 5,258 posts Joined: Oct 2004 From: J@Y B33	I think because with v0.7.0 new engine, ollama added support for multimodal and this increase the memory usage significantly QUOTE Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models: Meta Llama 4 Google Gemma 3 Qwen 2.5 VL Mistral Small 3.1 and more vision models. I checked few model size vs actual loaded phi4:latest 9.1GB become 10GB deepseek-r1:14b 9.0GB become 10GB MiMo-7B-RL-GGUF:Q8_0 8.1GB become 9.6GB gemma3:12b 8.1GB become 11GB gemma3:latest 3.3Gb become 6.0GB llama3.2:latest 2.0Gb become 4.0GB granite3.2-vision:2b-fp16 6.0Gb become 8.8GB Model that support vision the size increase a lot more than model without vision c2tony liked this post
Card PM	Report Top Like Quote Reply

TSxxboxx	May 28 2025, 12:39 PM Show posts by this member only \| Post #51
The mind is for having ideas, not holding them Senior Member 5,258 posts Joined: Oct 2004 From: J@Y B33	QUOTE(xxboxx @ Oct 26 2024, 07:25 PM) chow1942 using minicpm-v can you get all the text from this image correctly? I only got as such using it But using one of the online server it got very close to complete and correct I wonder if my parameters is not correct or ollama/open-webui engine issue. using llama 3.2 vision on one of the online server also give it correctly but then probably runs out of token I also tried on ChatGPT and everything is almost correct Ollama's vision now after update is a lot better than few months ago Using gemma3:12b there is some wrong data but a lot better than previously Qwen 2.5 recently also update for it's vision model and it is more accurate than gemma3 even though only at 7b vs 12b Even with bigger picture that have more data it still can get most thing right
Card PM	Report Top Like Quote Reply

c2tony	May 30 2025, 10:12 PM Show posts by this member only \| Post #52
Getting Started Junior Member 83 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 28 2025, 12:39 PM) Ollama's vision now after update is a lot better than few months ago Using gemma3:12b there is some wrong data but a lot better than previously » Click to show Spoiler - click again to hide... « Qwen 2.5 recently also update for it's vision model and it is more accurate than gemma3 even though only at 7b vs 12b Even with bigger picture that have more data it still can get most thing right Geez... You must using a lot of OCR? They gets better and better with a lot of added rules and regulations, uncensored wild west are disappearing
Card PM	Report Top Like Quote Reply

ipohps3	May 30 2025, 10:19 PM Show posts by this member only \| IPv6 \| Post #53
Regular Senior Member 1,974 posts Joined: Dec 2011	anyone tried Gemma 3n 4B ?
Card PM	Report Top Like Quote Reply

TSxxboxx	May 30 2025, 10:28 PM Show posts by this member only \| Post #54
The mind is for having ideas, not holding them Senior Member 5,258 posts Joined: Oct 2004 From: J@Y B33	QUOTE(c2tony @ May 30 2025, 10:12 PM) Geez... You must using a lot of OCR? They gets better and better with a lot of added rules and regulations, uncensored wild west are disappearing Yup, mainly for hand written on such table format. Other app OCR such as Snipping Tool or OneNote can't correctly recognize all the text or it can't preserve the table format. Been using ChatGPT before this but after few photos already reached daily caps for free user. I then use Gemini but it is not as accurate as ChatGPT. Now can rely on ollama to do OCR. Pros and cons. But the penalty on memory usage is hardest to swallow. Hopefully in future they can reduce back the memory usage. This post has been edited by xxboxx: May 30 2025, 10:32 PM
Card PM	Report Top Like Quote Reply

c2tony	May 30 2025, 10:29 PM Show posts by this member only \| Post #55
Getting Started Junior Member 83 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(ipohps3 @ May 30 2025, 10:19 PM) anyone tried Gemma 3n 4B ? what platform did you use on mobile or edge side? I installed PocketPal with llama-3.2-1b-instruct only, mobile have lots of distraction
Card PM	Report Top Like Quote Reply

TSxxboxx	May 30 2025, 10:31 PM Show posts by this member only \| Post #56
The mind is for having ideas, not holding them Senior Member 5,258 posts Joined: Oct 2004 From: J@Y B33	QUOTE(ipohps3 @ May 30 2025, 10:19 PM) anyone tried Gemma 3n 4B ? Gemma3:4B? I tried it, less capable than 12B model
Card PM	Report Top Like Quote Reply

c2tony	May 30 2025, 10:39 PM Show posts by this member only \| Post #57
Getting Started Junior Member 83 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 30 2025, 10:31 PM) Gemma3:4B? I tried it, less capable than 12B model I think he mean this: ipohps3 liked this post
Card PM	Report Top Like Quote Reply

TSxxboxx	May 31 2025, 08:35 AM Show posts by this member only \| Post #58
The mind is for having ideas, not holding them Senior Member 5,258 posts Joined: Oct 2004 From: J@Y B33	QUOTE(c2tony @ May 30 2025, 10:39 PM) I think he mean this: Oh no wonder I didn't saw it, I only checked at ollama website and there haven't got yet
Card PM	Report Top Like Quote Reply

c2tony	Jun 4 2025, 11:08 PM Show posts by this member only \| Post #59
Getting Started Junior Member 83 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(ipohps3 @ May 30 2025, 10:19 PM) anyone tried Gemma 3n 4B ? QUOTE(xxboxx @ May 31 2025, 08:35 AM) Oh no wonder I didn't saw it, I only checked at ollama website and there haven't got yet download https://github.com/google-ai-edge/gallery/releases/tag/1.0.3 and tried Gemma-3n-E4B-it-int4 at my phone today. My Honor Magic 6 pro turn into hand warmer , 3.51 tokens/s Lower if multitasking and I don't have the patient so i just close it there's a youtuber talking about it https://youtu.be/Vb8L5mtjLDo?si=fxp9nddnJ8zsuO08 xxboxx liked this post
Card PM	Report Top Like Quote Reply

ipohps3	Jun 4 2025, 11:15 PM Show posts by this member only \| Post #60
Regular Senior Member 1,974 posts Joined: Dec 2011	anyone tried the DeepSeek R1 0528 Qwen distilled version? how is it?
Card PM	Report Top Like Quote Reply

« Next Oldest · Software · Next Newest »

5 Pages < 1 2 3 4 5 >Top

Add Reply Options

Change to:

0.0143sec

0.62

5 queries

GZIP Disabled
Time is now: 28th November 2025 - 12:41 AM

All Rights Reserved © 2002- 2025 Vijandren Ramadass (~unite against racism~)

Removal Request

Powered by Invision Power Board © 2025 IPS, Inc.