Bỏ qua để đến nội dung

Hybrid Knowledge Ingestion Implementation Plan

Plans314 words2 min read
active

Execution note: Use executing-plans to implement this plan task-by-task.

Goal: Build a private-first ingestion flow that extracts Google Docs/Sheets content, normalizes it, and writes structured markdown drafts into this repository.

Architecture: Use a single CLI (scripts/kb-ingest.mjs) with per-source auth mode (oauth_refresh_token, oauth_access_token, or service_account). Convert raw Google payloads into a stable normalized model via scripts/kb-ingest-lib.mjs, then write staging docs plus raw cache for traceability.

Tech Stack: Node.js (ESM), googleapis, built-in Node test runner (node --test).


Task 1: Create Test Coverage for Normalization Layer

Phần tiêu đề “Task 1: Create Test Coverage for Normalization Layer”

Files:

  • Test: tests/kb-ingest/normalize.test.mjs

Steps:

  1. Add tests for slugify, extractGoogleLinks, classifyContent, docsToMarkdown, sheetsToMarkdown.
  2. Run node --test tests/kb-ingest/normalize.test.mjs and verify failure when implementation is missing.

Files:

  • Create: scripts/kb-ingest-lib.mjs

Steps:

  1. Implement pure utility functions used by ingestion pipeline.
  2. Re-run normalization tests until all pass.

Files:

  • Create: scripts/kb-ingest.mjs

Steps:

  1. Add argument parsing and config loading.
  2. Add auth client builder for both auth modes.
  3. Add extractors for gdoc and gsheet.
  4. Add staging markdown + raw cache outputs.
  5. Add summary/index outputs.

Task 4: Add Config Template and Security Guardrails

Phần tiêu đề “Task 4: Add Config Template and Security Guardrails”

Files:

  • Create: config/knowledge-ingest/sources.example.json
  • Modify: .gitignore

Steps:

  1. Provide config template with hybrid examples.
  2. Ignore local secrets/config/cache paths.

Files:

Steps:

  1. Add kb:ingest, kb:ingest:dry, and test scripts.
  2. Document setup, auth, execution, output paths, and review workflow.

Files:

  • Verify changed files

Steps:

  1. Run node --test tests/kb-ingest/normalize.test.mjs.
  2. Run node scripts/kb-ingest.mjs --help.
  3. Run node scripts/kb-ingest.mjs --config config/knowledge-ingest/sources.example.json --dry-run --limit 1 (expected auth/config failure is acceptable; parser + flow should execute).
  4. Report evidence and remaining gaps.