DeepSeek-R1-0528 동적 1비트 GGUF 실행

페이지 정보

작성자 관리자
댓글 0건 조회 52회 작성일 25-06-09 11:02

본문

DeepSeek-R1-0528은 DeepSeek의 R1 추론 모델에 대한 새로운 업데이트입니다. R1-0528은 세계에서 가장 강력한 오픈소스 모델로, OpenAI의 GPT-4.5, o3, 그리고 Google의 Gemini 2.5 Pro와 경쟁합니다.

DeepSeek은 Qwen3(8B)을 미세 조정하여 R1-0528을 증류한 버전도 출시했습니다. 이 증류 버전은 Qwen3(235B)과 동일한 성능을 달성합니다. Qwen3 GGUF:딥시크-R1-0528-Qwen3-8B-GGUF
당신도 할 수 있습니다미세 조정하다Unsloth를 사용한 Qwen3 모델입니다.

Unsloth의 1.78비트 버전을 사용하여 모델을 실행할 수 있습니다.Dynamic 2.0 GGUF여러분이 선호하는 추론 프레임워크에서. DeepSeek의 R1 671B 매개변수 모델을 720GB에서 185GB로 양자화하여 크기를 75% 줄였습니다.

추천: 읽어보세요 우리의 완전한 가이드 DeepSeek-R1-0528을 로컬에서 실행하는 방법에 대한 연습을 참조하세요.

정확도와 크기 간의 최적의 균형을 유지하기 위해 모든 레이어를 양자화하지 않고 , MoE 레이어를 선택적으로 낮은 비트로 양자화하고, 어텐션과 다른 레이어는 4비트 또는 6비트로 남겨둡니다.

그리고 전체DeepSeek-R1-0528 GGUFs 여기

????DeepSeek-R1-0528 실행 방법

DeepSeek-R1-0528-Qwen3-8B의 경우, 이 모델은 거의 모든 설정에 적합하며, RAM이 20GB 미만인 경우에도 적합합니다. 사전 준비는 필요하지 않습니다.

Qwen3과 전체 R1-0528 모델은 동일한 설정과 채팅 템플릿을 사용합니다.

DeepSeek에 따르면, R1(R1-0528은 동일한 설정을 사용해야 함) 추론에 권장되는 설정은 다음과 같습니다.
- 반복 및 불일치를 줄이려면 온도를 0.6으로
설정합니다. - top_p를 0.95로 설정합니다 (권장).
- 신뢰할 수 있는 평가를 위해 여러 테스트를 실행하고 결과의 평균을 구합니다.

이 정량 분석을 실행하려면 최소 64GB RAM을 사용하는 것이 좋습니다(GPU 없이 초당 1토큰을 얻습니다). 최적의 성능을 위해서는 최소 180GB 통합 메모리 또는 초당 5토큰 이상의 경우 180GB RAM+VRAM이 필요합니다. GPU 없이 모델을 실행하는 것이 기술적으로 가능하지만 Apple의 통합 메모리 칩을 활용하지 않는 한 권장하지 않습니다

.1.78비트 양자화의 경우:
- 1x 24GB GPU(모든 레이어 오프로드)에서 최대 20개 토큰/초의 처리량과 단일 사용자 추론의 경우 약 4개 토큰/초를 기대할 수 있습니다.-
다운로드하는 양자의 크기에 맞춰 RAM + VRAM의 조합을 구성해 보세요(예: 183GB = 최소 183GB VRAM+RAM 또는 통합 메모리)
.- RTX 4090과 같은 24GB GPU는 워크로드 및 구성에 따라 초당 3개 토큰을 달성해야 합니다.

???? Ollama에서 R1-0528-Qwen3-8B를 실행하는 방법:

아직 ollama를 설치하지 않으셨다면 설치하세요 ! 최대 32B 크기의 모델만 실행할 수 있습니다. 720GB R1-0528 모델을 모두 실행하려면,여기를 보세요.

apt-get update
apt-get install pciutils -y
curl -fsSL https://ollama.com/install.sh | sh

모델을 실행해 보세요! 실패할 경우 다른 터미널에서 ollama serve를 호출할 수 있습니다! Hugging Face 업로드의 params 에 모든 수정 사항과 제안된 매개변수(온도 등)를 포함했습니다 !ollama run hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL사고를 비활성화하려면 다음을 사용하세요(또는 시스템 프롬프트에서 설정할 수 있음).>>> Write your prompt here

✨ llama.cpp에서 R1-0528을 실행하는 방법:

최신 llama.cpp를 얻으세요여기 GitHub아래 빌드 지침을 따르세요. GPU가 없거나 CPU 추론만 원하는 경우 -DGGML_CUDA=ON을 -DGGML_CUDA=OFF로 변경하세요.

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

llama.cpp를 사용하여 모델을 직접 로드하려면 다음과 같이 하세요. (:IQ1_S)는 양자화 유형입니다. Hugging Face(3번 항목)를 통해서도 다운로드할 수 있습니다. 이는 ollama run 과 유사합니다 . export LLAMA_CACHE="folder"를 사용하여 llama.cpp를 특정 위치에 저장하도록 강제합니다 .

export LLAMA_CACHE="unsloth/DeepSeek-R1-0528-GGUF"
./llama.cpp/llama-cli \
    -hf unsloth/DeepSeek-R1-0528-GGUF:IQ1_S \
    --cache-type-k q4_0 \
    --threads -1 \
    --n-gpu-layers 99 \
    --prio 3 \
    --temp 0.6 \
    --top_p 0.95 \
    --min_p 0.01 \
    --ctx-size 16384 \
    --seed 3407 \
    -ot ".ffn_.*_exps.=CPU"

( pip install huggingface_hub hf_transfer 설치 후) 모델을 다운로드하세요 . UD-IQ1_S (동적 1.78비트 양자화) 또는 Q4_K_M 과 같은 다른 양자화 버전을 선택할 수 있습니다 . 크기와 정확도의 균형을 맞추려면 2.7비트 동적 양자화 UD-Q2_K_XL 을 사용하는 것을 권장합니다 .

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "0" # Can sometimes rate limit, so set to 0 to disable
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/DeepSeek-R1-0528-GGUF",
    local_dir = "unsloth/DeepSeek-R1-0528-GGUF",
    allow_patterns = ["*UD-IQ1_S*"], # Dynamic 1bit (185GB) Use "*UD-Q2_K_XL*" for Dynamic 2bit (251GB)
)

DeepSeek R1의 1.58비트 동적 양적 분석에 설명된 대로 Unsloth의 Flappy Bird 테스트를 실행합니다.CPU 스레드 수를 조정 하려면 --threads 32 , 컨텍스트 길이를 조정하려면 --ctx-size 16384 , 레이어 수에 대한 GPU 오프로드를 조정하려면 --n-gpu-layers 2를 수정하세요. GPU 메모리가 부족하면 조정해 보세요. CPU 추론만 사용하는 경우에도 제거하세요.

./llama.cpp/llama-cli \
    --model unsloth/DeepSeek-R1-0528-GGUF/UD-IQ1_S/DeepSeek-R1-0528-UD-IQ1_S-00001-of-00004.gguf \
    --cache-type-k q4_0 \
    --threads -1 \
    --n-gpu-layers 99 \
    --prio 3 \
    --temp 0.6 \
    --top_p 0.95 \
    --min_p 0.01 \
    --ctx-size 16384 \
    --seed 3407 \
    -ot ".ffn_.*_exps.=CPU" \
    -no-cnv \
    --prompt "<｜User｜>Create a Flappy Bird game in Python. You must include these things:\n1. You must use pygame.\n2. The background color should be randomly chosen and is a light shade. Start with a light blue color.\n3. Pressing SPACE multiple times will accelerate the bird.\n4. The bird's shape should be randomly chosen as a square, circle or triangle. The color should be randomly chosen as a dark color.\n5. Place on the bottom some land colored as dark brown or yellow chosen randomly.\n6. Make a score shown on the top right side. Increment if you pass pipes and don't hit them.\n7. Make randomly spaced pipes with enough space. Color them randomly as dark green or light brown or a dark gray shade.\n8. When you lose, show the best score. Make the text inside the screen. Pressing q or Esc will quit the game. Restarting is pressing SPACE again.\nThe final game should be inside a markdown section in Python. Check your code for errors and fix them before the final markdown section.<｜Assistant｜>"

또한 칠각형 테스트를 통해 동적 양자를 테스트합니다. 이는 움직이는 칠각형 모양으로 둘러싸인 공이 회전하는 것을 시뮬레이션하는 기본 물리 엔진을 만드는 모델을 테스트합니다.

./llama.cpp/llama-cli \ --model unsloth/DeepSeek-R1-0528-GGUF/UD-IQ1_S/DeepSeek-R1-0528-UD-IQ1_S-00001-of-00004.gguf \ --cache-type-k q4_0 \ --threads -1 \ --n-gpu-layers 99 \ --prio 3 \ --temp 0.6 \ --top_p 0.95 \ --min_p 0.01 \ --ctx-size 16384 \ --seed 3407 \ -ot ".ffn_.*_exps.=CPU" \ -no-cnv \ --prompt "<｜User｜>Write a Python program that shows 20 balls bouncing inside a spinning heptagon:\n- All balls have the same radius.\n- All balls have a number on it from 1 to 20.\n- All balls drop from the heptagon center when starting.\n- Colors are: #f8b862, #f6ad49, #f39800, #f08300, #ec6d51, #ee7948, #ed6d3d, #ec6800, #ec6800, #ee7800, #eb6238, #ea5506, #ea5506, #eb6101, #e49e61, #e45e32, #e17b34, #dd7a56, #db8449, #d66a35\n- The balls should be affected by gravity and friction, and they must bounce off the rotating walls realistically. There should also be collisions between balls.\n- The material of all the balls determines that their impact bounce height will not exceed the radius of the heptagon, but higher than ball radius.\n- All balls rotate with friction, the numbers on the ball can be used to indicate the spin of the ball.\n- The heptagon is spinning around its center, and the speed of spinning is 360 degrees per 5 seconds.\n- The heptagon size should be large enough to contain all the balls.\n- Do not use the pygame library; implement collision detection algorithms and collision response etc. by yourself. The following Python libraries are allowed: tkinter, math, numpy, dataclasses, typing, sys.\n- All codes should be put in a single Python file.<｜Assistant｜>"

???? 고맙습니다!

꾸준한 응원에 감사드립니다. 앞으로 몇 주 안에 좋은 소식이 있기를 바랍니다! ????

댓글목록

등록된 댓글이 없습니다.

DeepSeek-R1-0528 동적 1비트 GGUF 실행 > 생성형 AI 구축 참고자료

인기검색어

생성형 AI 구축 참고자료

DeepSeek-R1-0528 동적 1비트 GGUF 실행

페이지 정보

본문

???? Ollama에서 R1-0528-Qwen3-8B를 실행하는 방법:

✨ llama.cpp에서 R1-0528을 실행하는 방법:

관련링크

댓글목록

회원로그인

사이트 정보

생성형 AI 구축일지

접속자집계