Ollama - Offline Generative AI

Lowyat.NET forums

Lowyat.NET Kopitiam Garage Sales

Lowyat.NET Rules and Regulations FAQ Help Search Members

Welcome Guest ( Log In | Register )

Lowyat.NET -> Software

Bump Topic Add Reply RSS Feed

2 Pages 1 2 >Bottom

Outline · [ Standard ] · Linear+

Ollama - Offline Generative AI, Similar to ChatGPT

views

c2tony	Aug 21 2024, 07:57 PM Return to original view \| Post #1
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	Phi3 updated to Phi3.5 and get it right this time QUOTE Since all three of Sally's brothers share the same two sisters, it implies that these are also her siblings because in a family unit with multiple children like this one (including both male and female), there is only one set of sisters for each brother. Therefore, despite having three brothers, Sally has just one sister—the fact they all have "two" sisters at common refers to the same individual who counts once per sibling relationship in a family with multiple children sharing identical pairs among themselves. So, Sally indeed only has one biological sister. xxboxx liked this post
Card PM	Report Top Like Quote Reply

c2tony	Apr 30 2025, 05:24 PM Return to original view \| Post #2
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	do anyone know how to close that thinking stuff on deepseek or qwen3? ollama webui of course P/S: https://huggingface.co/jedisct1/MiMo-7B-RL-...f?download=true can play with xiaomi ai This post has been edited by c2tony: Apr 30 2025, 10:04 PM
Card PM	Report Top Like Quote Reply

c2tony	May 1 2025, 01:14 PM Return to original view \| Post #3
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 1 2025, 07:59 AM) I remember using deepseek, the thinking stuff is not shown unless press the arrow beside the model name How's the Xiaomi AI compared deepseek? Better answer? yes, but i don't want it to show that arrow! it takes more times to show the steps, whether you click it or not did not manage to try MiMo yet, i don't know how to load gguf
Card PM	Report Top Like Quote Reply

c2tony	May 1 2025, 10:11 PM Return to original view \| Post #4
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 1 2025, 04:29 PM) Oh you mean you don't want it to do the thinking stuff? I don't think can, those models are designed for thinking, for questions that need deep thought for answer, these kind of models are better than other models that doesn't do thinking. But if straight forward question, such as calculation then these models waste a lot of time to get the obvious answer. Using terminal/CMD, type "ollama pull hf.co/jedisct1/MiMo-7B-RL-GGUF:Q8_0" this will pull the Q8_0 8.1GB model If you want the smaller 4.7GB model, type "ollama pull hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M" I tried it and the answer it given feels as good as deepseek. When feed data for it to analyze, it does take some time to process it before give the answer thanks for the command I tried it also Do you familiar with the thought experiment the ship of Theseus? In the field of identify metaphysics? If those removed planks are restored and reassembled, free of the rot, is that the ship of Theseus? the third question it "think" about 7min Neither is the true ship or both are the true ship? - it's still thinking...
Card PM	Report Top Like Quote Reply

c2tony	May 2 2025, 02:05 PM Return to original view \| Post #5
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 2 2025, 08:55 AM) » Click to show Spoiler - click again to hide... « 7 Mins thinking, it really goes into deep thought I asked more simpler question, "how many “e” in benzodiazepines?" It takes almost 2 minutes to think, and the thought process is a lot -- Deleted spoilers -- On the other hand if use smaller model such as llama3.2, it is fast but give wrong answer "In the word "benzodiazepines," there are two "e"s and also one "i" but not affecting the total count of e" Deepseek took longer than Mimo, 2 minutes plus and give correct answer llama3.2:3b-instruct-fp16 after 2+ min answered: In the word "benzodiazepines", the letter "e" appears three times. while smollm2:1.7b-instruct-fp16 gave me TypeError: NetworkError when attempting to fetch XiaoMi's MiMo LLM are relative new. Afterall they're all LLMs using the same "highway": pattern recognizing. If AI starting to understand then we might need to concern about their concise awakening
Card PM	Report Top Like Quote Reply

c2tony	May 26 2025, 08:17 PM Return to original view \| Post #6
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	Lately gemma3 update for 12b are annoying, they distribute it to my cpu & gpu, just won't run at 100% GPU anymore. CODE ollama ps NAME ID SIZE PROCESSOR UNTIL gemma3:12b-it-qat 5d4fa005e7bb 12 GB 31%/69% CPU/GPU 4 minutes from now
Card PM	Report Top Like Quote Reply

c2tony	May 26 2025, 09:48 PM Return to original view \| Post #7
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 26 2025, 08:41 PM) How many GB is your VRAM? Windows also uses some VRAM, if 12GB then not enough. When not enough VRAM then it will offload some to CPU and causes the slowdown. 12GB vram, i know... Qwen3 with 14b and 9.3GB uses my 100% GPU, for a comparison CODE ollama ps NAME ID SIZE PROCESSOR UNTIL qwen3:14b 7d7da67570e2 10 GB 100% GPU 4 minutes from now CODE NAME ID SIZE MODIFIED gemma3:12b-it-qat 5d4fa005e7bb 8.9 GB 2 weeks ago qwen3:14b 7d7da67570e2 9.3 GB 3 weeks ago This post has been edited by c2tony: May 26 2025, 10:09 PM
Card PM	Report Top Like Quote Reply

c2tony	May 26 2025, 10:08 PM Return to original view \| Post #8
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(ipohps3 @ May 26 2025, 09:55 PM) donno about you guys. i was enthusiastic about open models earlier this year with DeepSeek in Jan and the following months with other open models being released also. however, since last month and this month with Google Gemini 2.5 released, don't think I would want to go back using open models since Gemini is getting extremely good at almost all things and none of the open models that can run with RTX3090 can come close to it. after sometime, paying the 20usd per month is more productive for me to get things done than using open models. Yeah, you paid. That's the whole point! "one-off" or "batch" processing are best when you pay. I wouldn't pay $20 for my use case. Gemini is a closed system, you don’t get to tweak it, audit its training data, or run it locally. For some users, this trade-off is worth it. For others, it’s not. Not to mention there's a limit for API can't be use at offline
Card PM	Report Top Like Quote Reply

c2tony	May 27 2025, 11:21 PM Return to original view \| Post #9
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 26 2025, 11:39 PM) There you go, not enough VRAM. Why your gemma3:12b-it-qat is 12GB? I see ollama page it is only 8.9GB That's why I say after update, a 12b model of 8.9GB becomes 12GB when you actually runs it. Gemini respond: It gets de-quantized or processed in a higher precision during runtime. The architecture, the specific runtime precision of activation and KV cache, and the optimizations applied by the inference framework Gemma 3's multimodal nature and potentially different KV cache handling seem to be key contributors to its higher observed runtime memory usage compared to Qwen 2 14B models of similar quantization. this mean time for a higher vram gpu or upgrade to npu
Card PM	Report Top Like Quote Reply

c2tony	May 30 2025, 10:12 PM Return to original view \| Post #10
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 28 2025, 12:39 PM) Ollama's vision now after update is a lot better than few months ago Using gemma3:12b there is some wrong data but a lot better than previously » Click to show Spoiler - click again to hide... « Qwen 2.5 recently also update for it's vision model and it is more accurate than gemma3 even though only at 7b vs 12b Even with bigger picture that have more data it still can get most thing right Geez... You must using a lot of OCR? They gets better and better with a lot of added rules and regulations, uncensored wild west are disappearing
Card PM	Report Top Like Quote Reply

c2tony	May 30 2025, 10:29 PM Return to original view \| Post #11
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(ipohps3 @ May 30 2025, 10:19 PM) anyone tried Gemma 3n 4B ? what platform did you use on mobile or edge side? I installed PocketPal with llama-3.2-1b-instruct only, mobile have lots of distraction
Card PM	Report Top Like Quote Reply

c2tony	May 30 2025, 10:39 PM Return to original view \| Post #12
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ May 30 2025, 10:31 PM) Gemma3:4B? I tried it, less capable than 12B model I think he mean this: ipohps3 liked this post
Card PM	Report Top Like Quote Reply

c2tony	Jun 4 2025, 11:08 PM Return to original view \| Post #13
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(ipohps3 @ May 30 2025, 10:19 PM) anyone tried Gemma 3n 4B ? QUOTE(xxboxx @ May 31 2025, 08:35 AM) Oh no wonder I didn't saw it, I only checked at ollama website and there haven't got yet download https://github.com/google-ai-edge/gallery/releases/tag/1.0.3 and tried Gemma-3n-E4B-it-int4 at my phone today. My Honor Magic 6 pro turn into hand warmer , 3.51 tokens/s Lower if multitasking and I don't have the patient so i just close it there's a youtuber talking about it https://youtu.be/Vb8L5mtjLDo?si=fxp9nddnJ8zsuO08 xxboxx liked this post
Card PM	Report Top Like Quote Reply

c2tony	Jun 4 2025, 11:58 PM Return to original view \| Post #14
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(ipohps3 @ Jun 4 2025, 11:15 PM) anyone tried the DeepSeek R1 0528 Qwen distilled version? how is it? It can't answer CODE how many e in “defenselessness” took more than 5min and still thinking so I stopped it.
Card PM	Report Top Like Quote Reply

c2tony	Jun 7 2025, 08:25 AM Return to original view \| Post #15
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(ipohps3 @ Jun 5 2025, 03:37 PM) what is it qat quantization? Instead of compressing the photo into smaller jpeg, we tell the artist to paint with fewer color instead. it = instruction tuned not that the model are fluent in Italian language 😁 quantization: Convertion of finished painting to a desired jpeg compression qat (Quantization-Aware Training): Qat is like instead of compressing the photo into smaller jpeg, we tell the artist to paint with fewer color instead hmm...... is that why Gemma3 occupy so much more memory but it's not that slow btw IT-QAT refers to instruction-tuned Quantization-Aware Training (QAT) models, specifically in the Gemma 3 series. These models are optimized using QAT to maintain high quality while significantly reducing memory requirements, making them more efficient for deployment on consumer-grade GPUs. For example: - Gemma 3 27B IT-QAT → Reduced from 54GB to 14.1GB - Gemma 3 12B IT-QAT → Reduced from 24GB to 6.6GB - Gemma 3 4B IT-QAT → Reduced from 8GB to 2.6GB - Gemma 3 1B IT-QAT → Reduced from 2GB to 0.5GB These models are designed to preserve similar quality as half-precision models (BF16) while using less memory, making them ideal for running locally on devices with limited resources. This post has been edited by c2tony: Jun 7 2025, 08:50 AM
Card PM	Report Top Like Quote Reply

c2tony	Jun 8 2025, 12:17 PM Return to original view \| Post #16
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ Jun 7 2025, 11:19 AM) Can only salivate for such LLM performance. Waiting for the days when Intel release their B60 GPU with 24GB and hopefully around 2k price lol This is too extreme! I don't do much with AI nowadays other than satisfying my curiosity, so perplexity.ai , gemini and copilot are more than enough at phone. ps: scanning every receipt and let AI do the accounting looks like a great use of AI
Card PM	Report Top Like Quote Reply

c2tony	Jun 8 2025, 10:56 PM Return to original view \| Post #17
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ Jun 8 2025, 05:44 PM) Even only as a hobby but if able to run bigger parameters model we can get more intelligent AI. Like the comparison above, gemma3:12b is a lot more capable than gemma3:4b and similar to deepseek-r1:14b. If have access to more VRAM we can run gemma3:27b or even deepseek-r1:70b which should be even more capable. I been feeding gemma3:12b with few photos of handwriting and each time it answer some part wrongly I corrected it. After few times now it's recognition of the handwriting have improved a lot compared to the first time, but still there are some mistakes. If gemma3:27b and it's higher intelligence then it will be even less mistake. ikr Intel had been ignorant about their processors hopefully they won't make the same mistakes with GPUs this time there's no easy route for running AI locally, let's hope for Intel Arc GPU sometimes i just feeling the rush to get those old 2080 modified 22gb from china, but i chicken out xxboxx liked this post
Card PM	Report Top Like Quote Reply

c2tony	Jun 11 2025, 09:42 AM Return to original view \| Post #18
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ Jun 8 2025, 05:44 PM) Even only as a hobby but if able to run bigger parameters model we can get more intelligent AI. Like the comparison above, gemma3:12b is a lot more capable than gemma3:4b and similar to deepseek-r1:14b. If have access to more VRAM we can run gemma3:27b or even deepseek-r1:70b which should be even more capable. I been feeding gemma3:12b with few photos of handwriting and each time it answer some part wrongly I corrected it. After few times now it's recognition of the handwriting have improved a lot compared to the first time, but still there are some mistakes. If gemma3:27b and it's higher intelligence then it will be even less mistake. here's something interesting I found, AI processors with loads of ram use for larger models https://youtu.be/B7GDr-VFuEo?si=mK-jvQuXkHwmptel
Card PM	Report Top Like Quote Reply

c2tony	Jun 12 2025, 09:34 PM Return to original view \| Post #19
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ Jun 12 2025, 08:29 PM) I watched the video, Ryzen AI MAX+ 395 indeed a powerful CPU for AI, even beats out M4. Just that this CPU price is still very high. Maybe in 1 or 2 years time we'll get such powerful CPU in mid range price. for the price, it's better value, only change processor motherboard and ram still better than single gpu card with the same price It's relative new processor, only saw the intel core ultra. Didn't saw anyone selling the amd AI processor yet, but you can get am5 8600G and 8700G for the same function
Card PM	Report Top Like Quote Reply

c2tony	Jun 13 2025, 10:36 PM Return to original view \| Post #20
Getting Started Junior Member 86 posts Joined: Jan 2003 From: BM, Butterworth, Penang island.	QUOTE(xxboxx @ Jun 13 2025, 12:51 AM) But the 8600G and 8700G have different iGPU than the Ryzen AI MAX+ 395, is it have same performance? 8700G = 16 TOPS ryzen ai max+ 395 = 55 TOPS RTX3060 12GB = 100 TOPS Apple Mac Studio M4 Max = 38 TOPS They all can run. BTW, 55 TOPS may sound like more AI power than 38 TOPS, the way Apple handles data and optimizes usage can deliver equivalent or faster AI execution Even if your PC has 128GB of RAM, your GPU might be capped by its 24GB VRAM when loading a large AI model With Apple’s unified memory, you might comfortably run a llama4:16x17b entirely in GPU addressable space if you have 96GB of ram.
Card PM	Report Top Like Quote Reply

« Next Oldest · Software · Next Newest »

2 Pages 1 2 >Top

Add Reply Options

Change to:

0.0178sec

1.11

6 queries

GZIP Disabled
Time is now: 18th December 2025 - 12:25 PM

All Rights Reserved © 2002- 2025 Vijandren Ramadass (~unite against racism~)

Removal Request

Powered by Invision Power Board © 2025 IPS, Inc.