Storage Architecture & File Specification

The following section outlines the directory structure and file types within the user project. The root folder of these files and folders are at /automl.

Data Cache (Vector & Graph Stores)

  • Cached Data Index (cache/weaviate/)

    • Type: Tarball / Dataset

    • Description: Compressed Weaviate vector database indices used for high-performance semantic search across the metadata of datasets.

  • Cached Dataset Lexical Strand (cache/surrealdb/datasets/)

    • Type: Tarball / Dataset

    • Description: Lexical and relational metadata for specific datasets stored in SurrealDB format to support structured queries.

  • Cached Cohort Lexical Strand (cache/surrealdb/cohorts/)

    • Type: Tarball / Cohort

    • Description: Specific relational data and indices for defined patient or sample cohorts.

User & Session Management

  • User Profile Directory (users/user-[id]/)

    • Type: Directory / User

    • Description: The root container for all user-specific settings, preferences, and session history.

  • Settings (settings.record)

    • Type: Record / User

    • Description: A binary or structured record file containing user-specific application preferences and interface configurations.

  • Conversation-User Links (conversations-v2.record)

    • Type: Record / User

    • Description: A mapping file that associates specific conversation IDs with the user's profile, used for populating the user's chat history list.

Conversation Session Details

Conversation Session (conversations/[conversation_uuid]/)

  • Type: Directory / Conversation

  • Description: A dedicated workspace for a single AI interaction session, containing all logs, plans, and generated artifacts.

Conversation History (history/chunk[nnn].record)

  • Type: Record / Conversation

  • Description: Segmented log files (chunks) containing the actual dialogue exchange between the user and the AI.

Artifact-Conversation Map (artifacts.record)

  • Type: Record / Conversation

  • Description: Metadata linking generated files or datasets to the specific point in the conversation where they were created.

Artifact Data (artifacts/[name].[file-format])

  • Type: File / Artifact

  • Description: The actual output files generated by the system during a session. The file format depends on the type of artifact:

    • Dataframe artifact: .parquet (Parquet tables)

    • Notebook artifact: .ipynb (Jupyter Notebooks)

    • Model artifact: .pkl (Pickled objects of a scikit-learn Estimatorarrow-up-right)

    • Chart artifact: .png (PNG images)

Plan-Conversation (plans.record)

  • Type: Record / Conversation

  • Description: A record of the execution steps or "thought process" the agent intended to follow to fulfill user requests.

Session Log (log.txt)

  • Type: File / Conversation

  • Description: A plaintext execution log used for debugging and tracking backend system calls during the conversation.

Last updated

Was this helpful?