Data Configure#
Before starting OntoAnno, open configs/demo.yaml and fill in the fields for
your dataset. This is the minimum template for a normal first run.
If you want optional features such as reference-label comparison, PDF evidence,
or precomputed marker genes, use configs/demo_optional.yaml as the example
template.
Required Fields#
For a first run, you usually only need to edit these fields:
project:
name: MyProject
work_dir: /work/MyProject
inputs:
seurat_rds: /data/my_project/my_dataset.rds
annotation:
species: human
tissue_name: human pancreatic tumor
parent_res:
- 0.1
- 0.3
Field |
What it controls |
Example |
|---|---|---|
|
A short name for this OntoAnno run. Use letters, numbers, or underscores. |
|
|
Where OntoAnno saves memory, intermediate files, reviewed labels, and reports. |
|
|
The Seurat |
|
|
The species used for ontology and marker evidence lookup. |
|
|
A biological description of the dataset tissue or disease context. |
|
|
Clustering resolutions OntoAnno will test for parent annotation. |
|
Optional Fields#
These fields are shown in configs/demo_optional.yaml. Change them only when
you want to use the specific feature or intentionally change pipeline behavior.
Optional Input Sources#
Field |
What it controls |
Example / options |
|---|---|---|
|
Optional labels from another method. This does not change OntoAnno annotation; it is used only when |
|
|
Folder of literature PDFs for marker evidence extraction. Empty or |
|
|
Existing cluster marker files if you want to skip marker detection and start from annotation. The Seurat object must already contain matching |
|
Annotation Behavior#
Field |
What it controls |
Example / options |
|---|---|---|
|
Whether OntoAnno should run Seurat normalization, variable feature selection, scaling, PCA, UMAP, and clustering before annotation. Set |
|
|
Resolutions used only when you ask OntoAnno to subcluster a parent cell type. |
|
|
Minimum number of cells required before running subclustering for a parent cell type. |
|
|
Number of repeated LLM calls for parent annotation. Larger values cost more and take longer but can stabilize labels. |
|
|
Number of repeated LLM calls for subcluster annotation. Ignored if no subclustering is requested. |
|
Ontology And Review Policy#
Field |
What it controls |
Example / options |
|---|---|---|
|
Whether RAG review should prefer ontology-mapped candidate labels when possible. |
|
|
Label specificity during review. This does not change clustering resolution. |
|
|
Whether tied label decisions should go to human review. |
|
|
Whether unknown or poorly matched labels should go to human review. |
|
LLM And PDF Models#
Field |
What it controls |
Example / options |
|---|---|---|
|
Model used for main annotation and review. Changing it can change labels, cost, and runtime. |
|
|
Model used only when |
|
Subclustering#
Field |
What it controls |
Example / options |
|---|---|---|
|
Parent cell types to split into finer subclusters. Keep |
|
Evaluation And Report#
Field |
What it controls |
Example / options |
|---|---|---|
|
Whether to compare OntoAnno labels against |
|
|
Column name in |
|
|
Final report format. |
|
Advanced Fields Not Shown In demo_optional.yaml#
Most users should not edit these. They are still supported by the config loader when needed.
Field |
When it matters |
Notes |
|---|---|---|
|
Import an existing GPTAnno parent annotation output for review/RAG/report workflows. |
Advanced import path. It is not a replacement for |
|
Use a custom OpenAI-compatible gateway. |
Keep default |
|
Override the model’s system instruction. |
Can change annotation behavior; leave unset unless you know why. |
|
Force advanced subcluster label or resolution choices. |
Developer or expert use only. |
|
Run external baseline comparison commands. |
Advanced and potentially executes shell commands. |