Create datasets that can be measured.

Alys turns a research prompt into local-first dataset artifacts: discovered, verified, deduplicated, structured, benchmarked, and ready for model training.

>
[SRC]ranked 38 source candidates
[CHK]resolved 7 contradiction clusters
[DED]reduced duplicate records by 24%
[EVAL]schema, diversity, utility checks passed
Local-first runtimeJSONL / CSV / RAG / Instruction
PipelineDiscover/Verify/Dedupe/Score/Export

Why Alys

Not a scraper. A dataset engine.

Firecrawl extracts pages. Alys creates evaluated datasets by orchestrating research, verification, synthesis, and export inside one local runtime.

01

Discover

Find and rank source candidates.

02

Verify

Compare claims and resolve contradictions.

03

Structure

Generate training-ready records.

04

Evaluate

Score schema, diversity, utility, and leakage risk.

Install Alys

Start from your terminal.

Run Alys with npx.

npx alys-akusa

Evaluation

Benchmarks are part of the product.

Statistical fidelity

KS, TVD, Wasserstein, and correlation preservation.

ML utility

TSTR-style downstream model performance checks.

Diversity

Duplicate rate, entropy, redundancy, and mode-collapse signals.

Privacy posture

Nearest-neighbor leakage and membership-inference resistance.

Agentic pipeline

Discovery, extraction, debate, curation, and evaluation are separate modules.

Provider-managed

Users should not bring model keys. Alys owns quality modes, credits, and limits.

Export-native

JSONL, CSV, instruction records, RAG chunks, manifests, and benchmark reports.

Use cases

Instruction tuning datasets
RAG-ready corpora
Structured Q&A generation
Domain research datasets
Sales and support simulations
Evaluation and benchmark sets

Generate less junk. Measure more truth.

Open Alys