Populate README with project description, tech stack, quick start and tutorial contents

2026-04-03 11:48:39 +02:00
parent 0d6cbb0b69
commit 58b8bea857
2 changed files with 804 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,78 @@
 # TurboQuant_ROCm_Tutorial

+Tutorial passo-passo per costruire ed eseguire **llama.cpp** con compressione **TurboQuant KV cache** su GPU **AMD ROCm/HIP**.
+
+## Descrizione
+
+Questo progetto è un sito web tutorial (single-page) che guida gli utenti attraverso:
+
+- Installazione e configurazione di ROCm 6.x su Ubuntu/Fedora
+- Build di llama.cpp con supporto HIP e TurboQuant
+- Download e quantizzazione di modelli LLM in formato GGUF
+- Esecuzione di inference, benchmark e server mode con KV cache quantizzata
+- Troubleshooting degli errori più comuni
+
+TurboQuant permette una quantizzazione aggressiva della KV cache fino a ~1-bit per le keys e 4-bit per le values, riducendo significativamente l'utilizzo di VRAM.
+
+## Tecnologie
+
+| Categoria | Tecnologia |
+|-----------|------------|
+| **ML/LLM** | llama.cpp, TurboQuant, Heavy-Hitter Oracle (H2O) |
+| **GPU/Compute** | AMD ROCm 6.x, HIP, hipBLAS |
+| **GPU Arch** | RDNA2 (gfx1030), RDNA3 (gfx110x), RDNA4 (gfx120x), Strix Halo (gfx1151) |
+| **Build** | CMake 3.21+, gcc/g++ 12+, clang |
+| **Modelli** | Formato GGUF, Qwen2.5-7B (esempio) |
+| **Quantizzazione** | `f16`, `q8_0`, `q4_0`, `tq1_0` (~1-bit), `tq4_0` (4-bit) |
+
+## Struttura del Progetto
+
+```
+├── index.html    # Tutorial single-page con CSS e JS embedded
+├── README.md     # Questo file
+└── LICENSE       # Licenza ISC
+```
+
+## Quick Start
+
+Il tutorial completo è disponibile nel file `index.html`. Puoi aprirlo direttamente nel browser:
+
+```bash
+# Apri il tutorial nel browser predefinito
+xdg-open index.html
+```
+
+Oppure servilo con un server HTTP locale:
+
+```bash
+python3 -m http.server 8000
+# Visita http://localhost:8000
+```
+
+## Contenuti del Tutorial
+
+1. **Step 0** -- Prerequisiti (hardware e software)
+2. **Step 1** -- Installazione dipendenze di sistema
+3. **Step 2** -- Installazione e verifica ROCm
+4. **Step 3** -- Clone del fork TurboQuant ROCm di llama.cpp
+5. **Step 4** -- Build CMake con HIP
+6. **Step 5** -- Download e quantizzazione di un modello
+7. **Step 6** -- Esecuzione test, benchmark e server mode
+8. **Risultati Attesi** -- Confronto VRAM e prestazioni
+9. **Troubleshooting** -- Risoluzione errori comuni
+
+## Requisiti Hardware
+
+- GPU AMD RDNA2 o superiore
+- 8GB+ VRAM raccomandati
+- Sistema Linux (Ubuntu 22.04/24.04 o Fedora 39+)
+
+## Licenza
+
+ISC License -- vedi il file [LICENSE](LICENSE) per i dettagli.
+
+## Fork di Riferimento
+
+Questo tutorial si basa sul fork sperimentale: [jagsan-cyber/turboquant-rocm-llamacpp](https://github.com/jagsan-cyber/turboquant-rocm-llamacpp)
+
+> **Nota**: Questo fork non è ancora stato mergeato nel repository principale di llama.cpp.