Welcome Guest ( Log In | Register )

Outline · [ Standard ] · Linear+

 Ollama - Offline Generative AI, Similar to ChatGPT

views
     
TSxxboxx
post May 2 2025, 10:24 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 2 2025, 02:05 PM)
llama3.2:3b-instruct-fp16  after 2+ min answered: In the word "benzodiazepines", the letter "e" appears three times.
while smollm2:1.7b-instruct-fp16 gave me TypeError: NetworkError when attempting to fetch

XiaoMi's MiMo LLM are relative new.
Afterall they're all LLMs using the same "highway": pattern recognizing. If AI starting to understand then we might need to concern about their concise awakening laugh.gif
*
I'm using llama3.2 2GB size, maybe that's why the answer is not good
TSxxboxx
post May 26 2025, 08:41 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 26 2025, 08:17 PM)
Lately gemma3 update for 12b are annoying, they distribute it to my cpu & gpu, just won't run at 100% GPU anymore.
CODE

ollama ps
NAME                 ID              SIZE     PROCESSOR          UNTIL
gemma3:12b-it-qat    5d4fa005e7bb    12 GB    31%/69% CPU/GPU    4 minutes from now

*
How many GB is your VRAM? Windows also uses some VRAM, if 12GB then not enough. When not enough VRAM then it will offload some to CPU and causes the slowdown.
TSxxboxx
post May 26 2025, 11:39 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 26 2025, 09:48 PM)
12GB vram, i know...
Qwen3 with 14b and 9.3GB uses my 100% GPU, for a comparison

CODE
ollama ps
NAME         ID              SIZE     PROCESSOR    UNTIL
qwen3:14b    7d7da67570e2    10 GB    100% GPU     4 minutes from now


CODE
NAME                          ID              SIZE      MODIFIED    
gemma3:12b-it-qat             5d4fa005e7bb    8.9 GB    2 weeks ago
qwen3:14b                     7d7da67570e2    9.3 GB    3 weeks ago

*
There you go, not enough VRAM.
Why your gemma3:12b-it-qat is 12GB? I see ollama page it is only 8.9GB

QUOTE(ipohps3 @ May 26 2025, 09:55 PM)
donno about you guys.

i was enthusiastic about open models earlier this year with DeepSeek in Jan and the following months with other open models being released also.

however, since last month and this month with Google Gemini 2.5 released, don't think I would want to go back using open models since Gemini+DeepMind is getting extremely good at almost all things and none of the open models that can run with RTX3090 can come close to it.

after sometime, paying the 20usd per month is more productive for me to get things done than using open models.
*
Gemini indeed has got a lot better, also ChatGPT. For me just using it for fun, I didn't pay for the more capable model. Maybe that's why I feel the free model is still less capable than open source model. Question such as this Gemini 2.5 Pro still got it wrong

user posted image
TSxxboxx
post May 27 2025, 10:13 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(ipohps3 @ May 27 2025, 01:27 AM)
yeah. sometimes it get the basic wrong. i tried on ChatGPT seems can get it right. but anyway i don't use it for this trivial stuff. i mainly use the YouTube video analysis, deep research, audio overview podcast, and canvas features for coding and research on new topics purposes. main thing is its large 1M context window which no one can support it locally at home even if you have open model that support 1M context window.
*
For suggesting new ideas or perspective LLM is useful but when it's analysis and research I find LLM missed what is important for me and in the end I still have to do the analysis and research by myself.

Mac Studio with 512GB RAM can handle 1M context window or more, but the price sweat.gif
Maybe the upcoming Intel GPU for AI will solve the memory bottleneck issue.
TSxxboxx
post May 28 2025, 11:48 AM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 27 2025, 11:21 PM)
That's why I say after update, a 12b model of 8.9GB becomes 12GB when you actually runs it.

Gemini respond:  It gets de-quantized or processed in a higher precision during runtime. The architecture, the specific runtime precision of activation and KV cache, and the optimizations applied by the inference framework
Gemma 3's multimodal nature and potentially different KV cache handling seem to be key contributors to its higher observed runtime memory usage compared to Qwen 2 14B models of similar quantization.

this mean time for a higher vram gpu or upgrade to npu
*
I see what you mean now. gemma3:latest is 3.3GB but when runs is using 6.0GB
Maybe ollama added more functions and that also keep increasing the memory usage.
I saw this, just having more context length will increase a lot of memory usage
https://github.com/open-webui/open-webui/discussions/8303
TSxxboxx
post May 28 2025, 12:13 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


I think because with v0.7.0 new engine, ollama added support for multimodal and this increase the memory usage significantly
QUOTE
Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models:

Meta Llama 4
Google Gemma 3
Qwen 2.5 VL
Mistral Small 3.1
and more vision models.


I checked few model size vs actual loaded
phi4:latest 9.1GB become 10GB
deepseek-r1:14b 9.0GB become 10GB
MiMo-7B-RL-GGUF:Q8_0 8.1GB become 9.6GB
gemma3:12b 8.1GB become 11GB
gemma3:latest 3.3Gb become 6.0GB
llama3.2:latest 2.0Gb become 4.0GB
granite3.2-vision:2b-fp16 6.0Gb become 8.8GB

Model that support vision the size increase a lot more than model without vision
TSxxboxx
post May 28 2025, 12:39 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(xxboxx @ Oct 26 2024, 07:25 PM)
chow1942 using minicpm-v can you get all the text from this image correctly?
user posted image

I only got as such using it
user posted image

But using one of the online server it got very close to complete and correct
user posted image

I wonder if my parameters is not correct or ollama/open-webui engine issue.

using llama 3.2 vision on one of the online server also give it correctly but then probably runs out of token
user posted image

I also tried on ChatGPT and everything is almost correct
user posted image
*
Ollama's vision now after update is a lot better than few months ago

Using gemma3:12b there is some wrong data but a lot better than previously
user posted image

Qwen 2.5 recently also update for it's vision model and it is more accurate than gemma3 even though only at 7b vs 12b
user posted image

Even with bigger picture that have more data it still can get most thing right
user posted image
TSxxboxx
post May 30 2025, 10:28 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 30 2025, 10:12 PM)
Geez... You must using a lot of OCR?  tongue.gif

They gets better and better with a lot of added rules and regulations, uncensored wild west are disappearing  lol.gif
*
Yup, mainly for hand written on such table format. Other app OCR such as Snipping Tool or OneNote can't correctly recognize all the text or it can't preserve the table format. Been using ChatGPT before this but after few photos already reached daily caps for free user. I then use Gemini but it is not as accurate as ChatGPT. Now can rely on ollama to do OCR.

Pros and cons. But the penalty on memory usage is hardest to swallow. Hopefully in future they can reduce back the memory usage.

This post has been edited by xxboxx: May 30 2025, 10:32 PM
TSxxboxx
post May 30 2025, 10:31 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(ipohps3 @ May 30 2025, 10:19 PM)
anyone tried Gemma 3n 4B ?
*
Gemma3:4B?

I tried it, less capable than 12B model
TSxxboxx
post May 31 2025, 08:35 AM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ May 30 2025, 10:39 PM)
I think he mean this:

user posted image
*
Oh no wonder I didn't saw it, I only checked at ollama website and there haven't got yet
TSxxboxx
post Jun 4 2025, 11:24 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ Jun 4 2025, 11:08 PM)
download https://github.com/google-ai-edge/gallery/releases/tag/1.0.3 and tried Gemma-3n-E4B-it-int4 at my phone today.
My Honor Magic 6 pro turn into hand warmer , 3.51 tokens/s
Lower if multitasking and I don't have the patient so i just close it  tongue.gif

there's a youtuber talking about it
https://youtu.be/Vb8L5mtjLDo?si=fxp9nddnJ8zsuO08
*
These small B LLM feels only good for simple task like rewrote text simple OCR and calculation as shown on the advert. I tried and it can detect receipt and do calculation but very bad at handwriting which need more complex OCR and counting many letters

user posted image

user posted image
user posted image

user posted image
TSxxboxx
post Jun 5 2025, 02:08 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(ipohps3 @ Jun 4 2025, 11:15 PM)
anyone tried the DeepSeek R1 0528 Qwen distilled version?

how is it?
*
I tried the 8b model and seems the model have bugs where it doesn't stop thinking and start to hallucinate to something else after few minutes.

I ask: A robot has 8 arms. There are 5 objects on a table: a knife, a fork, a spoon, a teddy bear and a doll. The robot picks up each object with an arm. He then shakes hands with himself. How many arms does he have free?

After 10 minutes and 95k characters while going off topic, I stopped it. This is it's thinking. Can't show all the text because over LYN post limit
» Click to show Spoiler - click again to hide... «

TSxxboxx
post Jun 5 2025, 02:09 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


The remaining thinking before I hit stop
» Click to show Spoiler - click again to hide... «

TSxxboxx
post Jun 5 2025, 02:12 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


With the 14b model, it took just 3 minutes to think before coming up with the answer
CODE
The robot has 8 arms. After picking up each of the 5 objects with an arm, he uses 2 more arms to shake hands with himself. Therefore, the number of free arms is 8 - (5 + 2) = 1.

Answer: The robot has 1 arm free.



» Click to show Spoiler - click again to hide... «


This post has been edited by xxboxx: Jun 5 2025, 02:13 PM
TSxxboxx
post Jun 5 2025, 03:32 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


I tried on Gemma too

gemma3:4b
QUOTE
Let's analyze the situation.
The robot has 8 arms.
The robot picks up each of the 5 objects with an arm. So, 5 arms are used to pick up the objects.
The robot shakes hands with himself. This means the robot uses 2 arms to shake hands with itself.
The total number of arms used is 5 (picking up objects) + 2 (shaking hands) = 7 arms.
The number of arms free is the total number of arms the robot has minus the number of arms used.
The number of arms free is 8 - 7 = 1.

Now we write out the argument and find the final answer.
The robot has 8 arms.
The robot picks up each of the 5 objects with an arm. So, the number of arms used to pick up objects is 5.
The robot shakes hands with himself. This means the robot uses 2 arms to shake hands with itself.
Total number of arms used = 5 (picking up objects) + 2 (shaking hands) = 7.
The number of arms free = 8 (total arms) - 7 (arms used) = 1.

Final Answer: The final answer is  1

When asked about e in “defenselessness”, it says: There are three “e”s in the word “defenselessness”.


gemma3:12b-it-qat
QUOTE
This is a bit of a trick question! Here's the breakdown:

  Picking up objects: The robot uses 5 arms to pick up the 5 objects.
  Shaking hands with himself: This means he uses two arms (one for each hand) to shake his own hand.

Therefore:

  Total arms used: 5 (picking up) + 2 (handshake) = 7 arms
  Arms free: 8 (total) - 7 (used) = 1 arm

Answer: The robot has 1 arm free.

It's designed to make you think about the handshake being a more complicated process, but it's a simple use of two arms!

When asked about e in “defenselessness”, it says: There are three "e"s in the word "defenselessness".


gemma3:12b think something else
QUOTE
This is a trick question! Here's how to solve it:

  Picking up objects: The robot uses 5 arms to pick up the 5 objects.
  Shaking hands with himself: This is the key! A robot can't shake hands with himself. It's a nonsensical action. The question is designed to distract you.

Therefore, the robot has 8 arms free. The actions described don't reduce the number of free arms.

When asked about e in “defenselessness”, it says: Let's count them! There are three "e"s in "defenselessness".


I asked another question to the 3 Gemma models
CODE
Bob has three boxes in front of him - Box A, Box B and Box C. Bob does not know what is in the boxes. Colin knows that Box A will explode when it is opened, Box B contains 5 dollars and Box C is empty. Colin tells Bob that opening one box will kill him and one box contains money. Should Bob open a box?


gemma3:12b
QUOTE
This is a clever logic puzzle! Here's how to break it down and the correct course of action for Bob.

Understanding the Information

  Colin's Statements:
      "Opening one box will kill him." (This means one box is the killer box - Box A)
      "One box contains money." (This means Box B has the money)
  Bob's Knowledge: Bob doesn't know what's in the boxes. He only knows Colin's statements.

Analyzing the Statements

Colin is telling the truth, but he's doing it in a way that is designed to mislead Bob. Let's consider what each statement *implies*:

  "Opening one box will kill him."  This doesn't tell Bob which box is the killer. It simply confirms that there is a killer box.
  "One box contains money." This confirms that there *is* a box with money.

The Solution

Bob should not open a box.

Here's why:

  Colin's statements are designed to trick Bob into thinking the killer box is obvious. However, they don't provide enough information for Bob to be certain.
  If Bob opens a box, there's a high probability he will die. Even if Colin is being truthful, Bob doesn't have enough information to make an informed decision about which box to open.
  The safest course of action is to avoid opening any of the boxes.

Important Note: This is a hypothetical puzzle. Do not attempt to replicate the situation described.



gemma3:12b-it-qat
QUOTE
This is a tricky logic puzzle! Here's how to break it down and the conclusion Bob should reach:

Understanding the Information

  Fact 1: Box A will kill Bob if opened.
  Fact 2: Box B contains $5.
  Fact 3: Box C is empty.
  Colin's Statements:
      "Opening one box will kill him." (This confirms Box A is the killer box)
      "One box contains money." (This confirms Box B has the money)

The Solution

Bob should not open a box.

Here's why:

  Colin's clues are redundant: Colin's statements are already known due to the setup of the puzzle. He's providing information Bob already knows.
  The risk is certain: The problem states that opening Box A will kill Bob.  There's no need to take a chance.


Important Note: This is a logic puzzle, and assumes perfect knowledge of the setup. In a real-world scenario, one should never take risks based on information from an unknown source!



gemma3:4b on the other hand not so smart
QUOTE
This is a classic logic puzzle! Here’s how to solve it:

  Colin’s Information: Colin knows Box A explodes, Box B has money, and Box C is empty. This means Colin knows Box B contains the money.

  Bob’s Deduction: Bob knows Colin knows that Box B has money. Therefore, if Bob opens Box B, he will confirm Colin’s knowledge.

  The Solution: Bob should open Box A. This is the only way to disprove Colin’s information.

Let me know if you'd like to try another puzzle!

TSxxboxx
post Jun 5 2025, 07:20 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


According to Gemma page
Quantization aware trained models (QAT)
The quantization aware trained Gemma 3 models preserves similar quality as half precision models (BF16) while maintaining a lower memory footprint (3x less compared to non-quantized models).
TSxxboxx
post Jun 7 2025, 11:19 AM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


Can only salivate for such LLM performance. Waiting for the days when Intel release their B60 GPU with 24GB and hopefully around 2k price lol

TSxxboxx
post Jun 8 2025, 05:44 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ Jun 8 2025, 12:17 PM)
This is too extreme! I don't do much with AI nowadays other than satisfying my curiosity, so perplexity.ai , gemini and copilot are more than enough at phone.

ps: scanning every receipt and let AI do the accounting looks like a great use of AI  whistling.gif
*
Even only as a hobby but if able to run bigger parameters model we can get more intelligent AI. Like the comparison above, gemma3:12b is a lot more capable than gemma3:4b and similar to deepseek-r1:14b. If have access to more VRAM we can run gemma3:27b or even deepseek-r1:70b which should be even more capable.

I been feeding gemma3:12b with few photos of handwriting and each time it answer some part wrongly I corrected it. After few times now it's recognition of the handwriting have improved a lot compared to the first time, but still there are some mistakes. If gemma3:27b and it's higher intelligence then it will be even less mistake.
TSxxboxx
post Jun 12 2025, 08:29 PM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ Jun 11 2025, 09:42 AM)
here's something interesting I found, AI processors with loads of ram use for larger models

https://youtu.be/B7GDr-VFuEo?si=mK-jvQuXkHwmptel
*
I watched the video, Ryzen AI MAX+ 395 indeed a powerful CPU for AI, even beats out M4. Just that this CPU price is still very high.

Maybe in 1 or 2 years time we'll get such powerful CPU in mid range price.
TSxxboxx
post Jun 13 2025, 12:51 AM

The mind is for having ideas, not holding them
*******
Senior Member
5,261 posts

Joined: Oct 2004
From: J@Y B33


QUOTE(c2tony @ Jun 12 2025, 09:34 PM)
for the price, it's better value, only change processor motherboard and ram biggrin.gif still better than single gpu card with the same price  whistling.gif
It's relative new processor, only saw the intel core ultra.
Didn't saw anyone selling the amd AI processor yet, but you can get am5 8600G and 8700G for the same function
*
But the 8600G and 8700G have different iGPU than the Ryzen AI MAX+ 395, is it have same performance?

3 Pages < 1 2 3 >Top
 

Change to:
| Lo-Fi Version
0.0195sec    0.63    6 queries    GZIP Disabled
Time is now: 24th December 2025 - 10:24 AM