docs: simplify documentation - clarify two separate systems

Rewrite documentation to clearly separate the two systems: README.md Changes: - Restructure as two independent systems (NotebookLM Agent vs DocuMente) - Clear separation of requirements: * NotebookLM Agent: NO Qdrant needed * DocuMente RAG: Qdrant REQUIRED - Remove confusing 'dual-system' language - Add FAQ section clarifying common questions - Simplified examples for each system - Clear statement: systems work independently docs/integration.md Changes: - Remove overly complex architecture diagrams - Focus on practical usage only - Simplified to 3 steps: start services → sync → query - Remove redundant API documentation (refer to SKILL.md) - Add clear use cases section - Shorter troubleshooting section docs/README.md Changes: - Minimal structure overview - Clear separation of endpoints by system - Quick links to relevant docs Impact: - 821 lines removed, 259 added - Much clearer for new users - No confusion about Qdrant requirements - Clear distinction between the two systems Closes documentation clarity issue
2026-04-06 18:48:16 +02:00
parent e239829938
commit 2aa96e9efa
3 changed files with 287 additions and 849 deletions
@@ -1,71 +1,55 @@
-# Documentation
+# Documentazione

-Benvenuto nella documentazione di NotebookLM Agent API.
+## Struttura

-## Indice
+```
+docs/
+├── README.md          # Questo file - panoramica
+├── integration.md     # Come usare NotebookLM con RAG
+└── api/              # Documentazione API dettagliata
+```

- [API Reference](./api/) - Documentazione completa delle API (TODO)
- [Examples](./examples/) - Esempi di utilizzo (TODO)
+## Guide Rapide

-## Panoramica
+### Solo NotebookLM Agent
+Non serve Qdrant. Vedi [README principale](../README.md) sezione "NotebookLM Agent".

-NotebookLM Agent API fornisce:
+### NotebookLM + RAG
+Serve Qdrant. Vedi [integration.md](./integration.md).

-1. **REST API** per gestire notebook, fonti, chat e generazione contenuti
-2. **Webhook System** per notifiche event-driven
-3. **AI Skill** per integrazione con agenti AI
+### API Reference
+Endpoint dettagliati in [api/endpoints.md](./api/endpoints.md).
+
+---

 ## Endpoint Principali

-### Notebook Management
- `POST /api/v1/notebooks` - Creare notebook
- `GET /api/v1/notebooks` - Listare notebook
- `GET /api/v1/notebooks/{id}` - Ottenere notebook
- `DELETE /api/v1/notebooks/{id}` - Eliminare notebook
+### NotebookLM Agent
+```
+POST   /api/v1/notebooks              # Crea notebook
+GET    /api/v1/notebooks              # Lista notebook
+POST   /api/v1/notebooks/{id}/sources # Aggiungi fonte
+POST   /api/v1/notebooks/{id}/chat    # Chat
+POST   /api/v1/webhooks               # Registra webhook
+```

-### Source Management
- `POST /api/v1/notebooks/{id}/sources` - Aggiungere fonte
- `GET /api/v1/notebooks/{id}/sources` - Listare fonti
- `POST /api/v1/notebooks/{id}/sources/research` - Ricerca web
+### DocuMente RAG
+```
+POST   /api/v1/documents               # Upload documento
+POST   /api/v1/query                  # Query RAG
+POST   /api/v1/notebooklm/sync/{id}   # Sincronizza notebook
+GET    /api/v1/notebooklm/indexed     # Lista sincronizzati
+```

-### Content Generation
- `POST /api/v1/notebooks/{id}/generate/audio` - Generare podcast
- `POST /api/v1/notebooks/{id}/generate/video` - Generare video
- `POST /api/v1/notebooks/{id}/generate/quiz` - Generare quiz
- `POST /api/v1/notebooks/{id}/generate/flashcards` - Generare flashcard
-
-### Webhooks
- `POST /api/v1/webhooks` - Registrare webhook
- `GET /api/v1/webhooks` - Listare webhook
- `POST /api/v1/webhooks/{id}/test` - Testare webhook
+---

 ## Autenticazione

-Tutte le richieste API richiedono header `X-API-Key`:
-
-```bash
-curl http://localhost:8000/api/v1/notebooks \
-  -H "X-API-Key: your-api-key"
+Header richiesto:
+```
+X-API-Key: your-api-key
 ```

-## Webhook Security
+---

-I webhook includono firma HMAC-SHA256 nell'header `X-Webhook-Signature`:
-
-```python
-import hmac
-import hashlib
-
-signature = hmac.new(
-    secret.encode(),
-    payload.encode(),
-    hashlib.sha256
-).hexdigest()
-```
-
-## Risorse
-
- [README](../README.md) - Panoramica progetto
- [PRD](../prd.md) - Requisiti prodotto
- [SKILL.md](../SKILL.md) - Skill per agenti AI
- [CONTRIBUTING](../CONTRIBUTING.md) - Come contribuire
+Per informazioni complete vedi [SKILL.md](../SKILL.md)
@@ -1,436 +1,200 @@
-# Guida Integrazione NotebookLM + RAG
+# Guida Integrazione NotebookLM con RAG

-Questo documento descrive l'integrazione tra **NotebookLM Agent** e **DocuMente RAG**, che permette di eseguire ricerche semantiche (RAG) sui contenuti dei notebook di Google NotebookLM.
+Questa guida spiega come usare **NotebookLM** con il sistema **RAG** di DocuMente.

 ---

-## Indice
+## Casi d'Uso

- [Overview](#overview)
- [Architettura](#architettura)
- [Come Funziona](#come-funziona)
- [API Reference](#api-reference)
- [Esempi di Utilizzo](#esempi-di-utilizzo)
- [Best Practices](#best-practices)
- [Troubleshooting](#troubleshooting)
+1. **Ricerca nei notebook**: "Cosa dicono i miei notebook sull'AI?"
+2. **Ricerca combinata**: "Cosa ho su Python nei documenti PDF e nei notebook?"
+3. **Analisi multi-notebook**: "Confronta le conclusioni tra notebook A e B"

 ---

-## Overview
+## Requisiti

-L'integrazione colma il divario tra **gestione notebook** (NotebookLM Agent) e **ricerca semantica** (DocuMente RAG), permettendo di:
-
- 🔍 **Ricercare** nei contenuti dei notebook con semantic search
- 🧠 **Usare LLM multi-provider** per interrogare i notebook
- 📊 **Combinare** notebook e documenti locali nelle stesse query
- 🎯 **Filtrare** per notebook specifici
- ⚡ **Indicizzare** automaticamente i contenuti
-
-### Use Cases
-
-1. **Research Assistant**: "Cosa dicono tutti i miei notebook sull'intelligenza artificiale?"
-2. **Knowledge Mining**: "Trova tutte le fonti che parlano di Python nei miei notebook di programmazione"
-3. **Cross-Notebook Analysis**: "Confronta le conclusioni tra il notebook A e il notebook B"
-4. **Document + Notebook Search**: "Quali informazioni ho sia nei documenti PDF che nei notebook?"
+- NotebookLM Agent funzionante
+- DocuMente RAG con Qdrant avviato
+- Almeno un notebook su NotebookLM

 ---

-## Architettura
+## Architettura Semplice

 ```
-┌─────────────────────────────────────────────────────────────────┐
-│                        NotebookLM Agent                         │
-│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐ │
-│  │  Notebooks  │───▶│   Sources   │───▶│   Full Text Get     │ │
-│  └─────────────┘    └─────────────┘    └─────────────────────┘ │
-└─────────────────────────────────────────────────────────────────┘
-                              │
-                              │ Extract Content
-                              ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                   NotebookLMIndexerService                      │
-│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐ │
-│  │   Chunking  │───▶│  Embedding  │───▶│   Metadata Store    │ │
-│  └─────────────┘    └─────────────┘    └─────────────────────┘ │
-└─────────────────────────────────────────────────────────────────┘
-                              │
-                              │ Index to Vector Store
-                              ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                         Qdrant Vector Store                     │
-│  ┌───────────────────────────────────────────────────────────┐  │
-│  │  Collection: "documents"                                  │  │
-│  │  Points with metadata:                                    │  │
-│  │    - notebook_id, source_id, source_title                 │  │
-│  │    - notebook_title, source_type                          │  │
-│  │    - source: "notebooklm"                                 │  │
-│  └───────────────────────────────────────────────────────────┘  │
-└─────────────────────────────────────────────────────────────────┘
-                              │
-                              │ Query with Filters
-                              ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                          RAGService                             │
-│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────┐ │
-│  │    Query    │───▶│   Search    │───▶│   LLM Generation    │ │
-│  └─────────────┘    └─────────────┘    └─────────────────────┘ │
-└─────────────────────────────────────────────────────────────────┘
+┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
+│  NotebookLM     │────▶│  DocuMente RAG   │────▶│   LLM Provider  │
+│  (Google)       │     │  (Qdrant + API)  │     │   (OpenAI/etc)  │
+└─────────────────┘     └──────────────────┘     └─────────────────┘
+         │                        │
+         │ Sincronizza            │ Query
+         ▼                        ▼
+  Contenuti dei          Ricerca semantica
+  notebook               su documenti + notebook
 ```

 ---

-## Come Funziona
+## Quick Start

-### 1. Sincronizzazione
-
-Quando sincronizzi un notebook:
-
-1. **Estrazione**: Ottiene tutte le fonti dal notebook via `notebooklm-py`
-2. **Full Text**: Recupera il testo completo di ogni fonte (se disponibile)
-3. **Chunking**: Divide i contenuti in chunks di ~1024 caratteri
-4. **Embedding**: Genera embeddings vettoriali usando OpenAI
-5. **Storage**: Salva in Qdrant con metadata completi
-
-### 2. Metadata Structure
-
-Ogni chunk memorizzato contiene:
-
-```json
-{
-  "text": "contenuto del chunk...",
-  "notebook_id": "uuid-del-notebook",
-  "source_id": "uuid-della-fonte",
-  "source_title": "Titolo della Fonte",
-  "source_type": "url|file|youtube|drive",
-  "notebook_title": "Titolo del Notebook",
-  "source": "notebooklm"
-}
-```
-
-### 3. Query
-
-Quando esegui una query:
-
-1. **Embedding**: La domanda viene convertita in embedding
-2. **Search**: Qdrant cerca i chunk più simili
-3. **Filter**: Se specificati, filtra per `notebook_id`
-4. **Context**: I chunk vengono formattati come contesto
-5. **Generation**: Il LLM genera la risposta basata sul contesto
-
---
-
-## API Reference
-
-### Sync Endpoints
-
-#### POST `/api/v1/notebooklm/sync/{notebook_id}`
-Sincronizza un notebook da NotebookLM al vector store.
-
-**Response:**
-```json
-{
-  "sync_id": "uuid-della-sync",
-  "notebook_id": "uuid-del-notebook",
-  "notebook_title": "Titolo Notebook",
-  "status": "success",
-  "sources_indexed": 5,
-  "total_chunks": 42,
-  "message": "Successfully synced 5 sources with 42 chunks"
-}
-```
-
-#### GET `/api/v1/notebooklm/indexed`
-Lista tutti i notebook sincronizzati.
-
-**Response:**
-```json
-{
-  "notebooks": [
-    {
-      "notebook_id": "uuid-1",
-      "notebook_title": "AI Research",
-      "sources_count": 10,
-      "chunks_count": 150,
-      "last_sync": "2026-01-15T10:30:00Z"
-    }
-  ],
-  "total": 1
-}
-```
-
-#### DELETE `/api/v1/notebooklm/sync/{notebook_id}`
-Rimuove un notebook dal vector store.
-
-**Response:**
-```json
-{
-  "notebook_id": "uuid-del-notebook",
-  "deleted": true,
-  "message": "Successfully removed index..."
-}
-```
-
-#### GET `/api/v1/notebooklm/sync/{notebook_id}/status`
-Verifica lo stato di sincronizzazione di un notebook.
-
-**Response:**
-```json
-{
-  "notebook_id": "uuid-del-notebook",
-  "status": "indexed",
-  "sources_count": 5,
-  "chunks_count": 42,
-  "last_sync": "2026-01-15T10:30:00Z"
-}
-```
-
-### Query Endpoints
-
-#### POST `/api/v1/query` (with notebook filter)
-Esegue una RAG query con possibilità di filtrare per notebook.
-
-**Request:**
-```json
-{
-  "question": "Quali sono i punti chiave?",
-  "notebook_ids": ["uuid-1", "uuid-2"],
-  "include_documents": true,
-  "k": 10,
-  "provider": "openai",
-  "model": "gpt-4o"
-}
-```
-
-**Response:**
-```json
-{
-  "question": "Quali sono i punti chiave?",
-  "answer": "Secondo i documenti e i notebook analizzati...",
-  "provider": "openai",
-  "model": "gpt-4o",
-  "sources": [
-    {
-      "text": "Contenuto del chunk...",
-      "source_type": "notebooklm",
-      "notebook_id": "uuid-1",
-      "notebook_title": "AI Research",
-      "source_title": "Introduction to AI"
-    }
-  ],
-  "user": "anonymous",
-  "filters_applied": {
-    "notebook_ids": ["uuid-1", "uuid-2"],
-    "include_documents": true
-  }
-}
-```
-
-#### POST `/api/v1/query/notebooks`
-Esegue una query **solo** sui notebook (esclude documenti locali).
-
-**Request:**
-```json
-{
-  "question": "Trova informazioni su...",
-  "notebook_ids": ["uuid-1"],
-  "k": 10,
-  "provider": "anthropic"
-}
-```
-
---
-
-## Esempi di Utilizzo
-
-### Esempio 1: Sincronizzazione e Query Base
+### 1. Avvia i Servizi

 ```bash
-# 1. Sincronizza un notebook
-curl -X POST http://localhost:8000/api/v1/notebooklm/sync/abc-123
+# Terminale 1: Qdrant (richiesto per RAG)
+docker run -p 6333:6333 qdrant/qdrant

-# 2. Query sul notebook sincronizzato
+# Terminale 2: DocuMente API
+uv run fastapi dev src/agentic_rag/api/main.py
+
+# Terminale 3 (opzionale): Web UI
+cd frontend && npm run dev
+```
+
+### 2. Sincronizza un Notebook
+
+```bash
+# Ottieni ID notebook da NotebookLM
+curl http://localhost:8000/api/v1/notebooks
+
+# Sincronizza nel RAG
+curl -X POST http://localhost:8000/api/v1/notebooklm/sync/{NOTEBOOK_ID}
+```
+
+### 3. Fai una Domanda
+
+```bash
+# Cerca solo nei notebook
 curl -X POST http://localhost:8000/api/v1/query/notebooks \
  -H "Content-Type: application/json" \
  -d '{
-    "question": "Quali sono le tecnologie AI menzionate?",
-    "notebook_ids": ["abc-123"]
+    "question": "Quali sono i punti chiave?",
+    "notebook_ids": ["uuid-1"]
  }'
-```

-### Esempio 2: Ricerca Multi-Notebook
-
-```bash
-# Query su più notebook contemporaneamente
+# Cerca in documenti + notebook
 curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{
-    "question": "Confronta gli approcci di machine learning descritti",
-    "notebook_ids": ["notebook-1", "notebook-2", "notebook-3"],
-    "k": 15,
-    "provider": "anthropic"
+    "question": "Confronta le fonti",
+    "notebook_ids": ["uuid-1"],
+    "include_documents": true
  }'
 ```

-### Esempio 3: Workflow Completo
-
-```bash
-#!/bin/bash
-
-# 1. Ottieni lista notebook da NotebookLM
-NOTEBOOKS=$(curl -s http://localhost:8000/api/v1/notebooks)
-
-# 2. Sincronizza il primo notebook
-NOTEBOOK_ID=$(echo $NOTEBOOKS | jq -r '.data.items[0].id')
-echo "Sincronizzazione notebook: $NOTEBOOK_ID"
-
-SYNC_RESULT=$(curl -s -X POST "http://localhost:8000/api/v1/notebooklm/sync/$NOTEBOOK_ID")
-echo "Risultato: $SYNC_RESULT"
-
-# 3. Attendi che la sincronizzazione sia completata (se asincrona)
-sleep 2
-
-# 4. Esegui query sul notebook
-curl -X POST http://localhost:8000/api/v1/query/notebooks \
-  -H "Content-Type: application/json" \
-  -d "{
-    \"question\": \"Riassumi i punti principali\",
-    \"notebook_ids\": [\"$NOTEBOOK_ID\"],
-    \"provider\": \"openai\"
-  }"
-```
-
 ---

-## Best Practices
+## API Endpoints

-### 1. **Sincronizzazione Selettiva**
-Non sincronizzare tutti i notebook, solo quelli rilevanti per le ricerche.
+### Gestione Notebook
+
+| Endpoint | Metodo | Descrizione |
+|----------|--------|-------------|
+| `/api/v1/notebooklm/sync/{id}` | POST | Sincronizza notebook |
+| `/api/v1/notebooklm/indexed` | GET | Lista notebook sincronizzati |
+| `/api/v1/notebooklm/sync/{id}/status` | GET | Verifica stato |
+| `/api/v1/notebooklm/sync/{id}` | DELETE | Rimuovi sincronizzazione |
+
+### Query
+
+| Endpoint | Metodo | Descrizione |
+|----------|--------|-------------|
+| `/api/v1/query/notebooks` | POST | Cerca solo nei notebook |
+| `/api/v1/query` | POST | Cerca in documenti + notebook |
+
+---
+
+## Esempi
+
+### Sincronizzazione

 ```bash
-# Sincronizza solo i notebook attivi
-for notebook_id in "notebook-1" "notebook-2"; do
-  curl -X POST "http://localhost:8000/api/v1/notebooklm/sync/$notebook_id"
-done
+# Sincronizza
+curl -X POST http://localhost:8000/api/v1/notebooklm/sync/abc-123
+
+# Response:
+# {
+#   "sync_id": "...",
+#   "notebook_id": "abc-123",
+#   "status": "success",
+#   "sources_indexed": 5,
+#   "total_chunks": 42
+# }
 ```

-### 2. **Gestione Chunks**
-Ogni fonte viene divisa in chunks di ~1024 caratteri. Se un notebook ha molte fonti grandi, considera:
- Aumentare `k` nelle query (default: 5, max: 50)
- Filtrare per notebook specifici per ridurre il contesto
-
-### 3. **Provider Selection**
-Usa provider diversi per tipologie di query diverse:
- **OpenAI GPT-4o**: Query complesse, analisi dettagliate
- **Anthropic Claude**: Sintesi lunghe, analisi testuali
- **Mistral**: Query veloci, risposte concise
-
-### 4. **Refresh Periodico**
-I notebook cambiano nel tempo. Considera di:
- Rimuovere e risincronizzare periodicamente
- Aggiungere un job schedulato per il refresh
+### Query con Filtri

 ```bash
-# Cron job per refresh settimanale
-0 2 * * 0 /path/to/sync-notebooks.sh
+# Multi-notebook
+curl -X POST http://localhost:8000/api/v1/query/notebooks \
+  -d '{
+    "question": "AI trends",
+    "notebook_ids": ["id-1", "id-2", "id-3"],
+    "provider": "openai"
+  }'
+
+# Con modello locale (Ollama)
+curl -X POST http://localhost:8000/api/v1/query/notebooks \
+  -d '{
+    "question": "Riassumi",
+    "notebook_ids": ["id-1"],
+    "provider": "ollama",
+    "model": "llama3.2"
+  }'
 ```

-### 5. **Monitoraggio**
-Traccia quali notebook sono sincronizzati:
+---
+
+## Web UI
+
+Se hai avviato il frontend:
+
+1. Vai su http://localhost:3000
+2. Sezione **Chat**
+3. Seleziona i notebook dalla lista
+4. Fai le tue domande
+
+---
+
+## Provider LLM
+
+Puoi usare qualsiasi provider supportato:
+
+**Cloud**: OpenAI, Anthropic, Google, Mistral, Azure  
+**Locale**: Ollama, LM Studio

 ```bash
-# Lista e verifica stato
-curl http://localhost:8000/api/v1/notebooklm/indexed | jq '.'
+# Esempio con Ollama (locale)
+curl -X POST http://localhost:8000/api/v1/query/notebooks \
+  -d '{
+    "question": "Spiega...",
+    "notebook_ids": ["id-1"],
+    "provider": "ollama",
+    "model": "llama3.2"
+  }'
 ```

 ---

 ## Troubleshooting

-### Problema: Sincronizzazione fallita
+**Problema: "Notebook not found"**  
+→ Verifica che il notebook esista su NotebookLM

-**Sintomi**: Errore 500 durante la sincronizzazione
+**Problema: Qdrant non risponde**  
+→ Controlla che Qdrant sia avviato: `docker ps`

-**Causa**: NotebookLM potrebbe non avere il testo completo disponibile per alcune fonti
-
-**Soluzione**:
-1. Verifica che il notebook esista: `GET /api/v1/notebooks/{id}`
-2. Controlla che le fonti siano indicizzate: NotebookLM mostra "Ready"
-3. Alcune fonti (YouTube, Drive) potrebbero non avere testo estratto
-
-### Problema: Query non trova risultati
-
-**Sintomi**: Risposta "I don't have enough information..."
-
-**Verifica**:
-```bash
-# 1. Il notebook è sincronizzato?
-curl http://localhost:8000/api/v1/notebooklm/sync/{notebook_id}/status
-
-# 2. Quanti chunks ci sono?
-curl http://localhost:8000/api/v1/notebooklm/indexed
-```
-
-**Soluzione**:
- Aumenta `k` nella query
- Verifica che il contenuto sia stato effettivamente estratto
- Controlla che l'embedding model sia configurato correttamente
-
-### Problema: Rate Limiting
-
-**Sintomi**: Errori 429 durante sincronizzazione
-
-**Soluzione**:
- NotebookLM ha rate limits aggressivi
- Aggiungi delay tra le sincronizzazioni
- Sincronizza durante ore di basso traffico
-
-```python
-# Aggiungi delay
-import asyncio
-
-for notebook_id in notebook_ids:
-    await sync_notebook(notebook_id)
-    await asyncio.sleep(5)  # Attendi 5 secondi
-```
+**Problema: Nessun risultato**  
+→ Verifica che il notebook sia sincronizzato: `GET /api/v1/notebooklm/indexed`

 ---

-## Performance Considerations
+## Limitazioni

-### Dimensione dei Chunks
- **Default**: 1024 caratteri
- **Trade-off**: 
-  - Chunks più grandi = più contesto ma meno precisione
-  - Chunks più piccoli = più precisione ma meno contesto
-
-### Numero di Notebook
- **Consigliato**: < 50 notebook sincronizzati contemporaneamente
- **Ottimale**: Filtra per notebook specifici nelle query
-
-### Refresh Strategy
- **Full Refresh**: Rimuovi tutto e risincronizza (lento ma pulito)
- **Incremental**: Aggiungi solo nuove fonti (più veloce ma può avere duplicati)
+- I contenuti devono essere "scaricabili" da NotebookLM (alcuni PDF potrebbero non avere testo)
+- La sincronizzazione è manuale (non automatica quando il notebook cambia)
+- Ogni fonte diventa chunk di ~1024 caratteri

 ---

-## Limitazioni Conosciute
-
-1. **Testo Completo**: Non tutte le fonti di NotebookLM hanno testo completo disponibile (es. alcuni PDF, YouTube)
-2. **Sync Non Automatica**: La sincronizzazione è manuale via API, non automatica
-3. **Storage**: I chunks duplicano lo storage (contenuto sia in NotebookLM che in Qdrant)
-4. **Embedding Model**: Attualmente usa OpenAI per embeddings (configurabile in futuro)
-
---
-
-## Roadmap
-
- [ ] **Auto-Sync**: Sincronizzazione automatica quando i notebook cambiano
- [ ] **Incremental Sync**: Aggiornamento solo delle fonti modificate
- [ ] **Multi-Embedder**: Supporto per altri modelli di embedding
- [ ] **Semantic Chunking**: Chunking basato su significato anziché lunghezza
- [ ] **Cross-Reference**: Link tra fonti simili in notebook diversi
-
---
-
-**Versione**: 1.0.0  
-**Ultimo Aggiornamento**: 2026-04-06
+Per domande avanzate vedi [SKILL.md](../SKILL.md)