# Storage Architecture & File Specification

The following section outlines the directory structure and file types within the user project. The root folder of these files and folders are at /automl.

## Data Cache (Vector & Graph Stores)

* **Cached Data Index (cache/weaviate/)**
  * Type: Tarball / Dataset
  * Description: Compressed Weaviate vector database indices used for high-performance semantic search across the metadata of datasets.
* **Cached Dataset Lexical Strand (cache/surrealdb/datasets/)**
  * Type: Tarball / Dataset
  * Description: Lexical and relational metadata for specific datasets stored in SurrealDB format to support structured queries.
* **Cached Cohort Lexical Strand (cache/surrealdb/cohorts/)**
  * Type: Tarball / Cohort
  * Description: Specific relational data and indices for defined patient or sample cohorts.

## User & Session Management

* **User Profile Directory (users/user-\[id]/)**
  * Type: Directory / User
  * Description: The root container for all user-specific settings, preferences, and session history.
* **Settings (settings.record)**
  * Type: Record / User
  * Description: A binary or structured record file containing user-specific application preferences and interface configurations.
* **Conversation-User Links (conversations-v2.record)**
  * Type: Record / User
  * Description: A mapping file that associates specific conversation IDs with the user's profile, used for populating the user's chat history list.

## Conversation Session Details

### **Conversation Session (conversations/\[conversation\_uuid]/)**

* Type: Directory / Conversation
* Description: A dedicated workspace for a single AI interaction session, containing all logs, plans, and generated artifacts.

### **Conversation History (history/chunk\[nnn].record)**

* Type: Record / Conversation
* Description: Segmented log files (chunks) containing the actual dialogue exchange between the user and the AI.

### **Artifact-Conversation Map (artifacts.record)**

* Type: Record / Conversation
* Description: Metadata linking generated files or datasets to the specific point in the conversation where they were created.

### **Artifact Data (artifacts/\[name].\[file-format])**

* Type: File / Artifact
* Description: The actual output files generated by the system during a session. The file format depends on the type of artifact:
  * Dataframe artifact: .parquet (Parquet tables)
  * Notebook artifact: .ipynb (Jupyter Notebooks)
  * Model artifact: .pkl (Pickled objects of a [scikit-learn Estimator](https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html))
  * Chart artifact: .png (PNG images)

### **Plan-Conversation (plans.record)**

* Type: Record / Conversation
* Description: A record of the execution steps or "thought process" the agent intended to follow to fulfill user requests.

### **Session Log (log.txt)**

* Type: File / Conversation
* Description: A plaintext execution log used for debugging and tracking backend system calls during the conversation.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.dnanexus.com/mlaccelerator/automlassistant/storage-architecture-and-file-specification.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
