In App Features

Please refer to the Usage and Limitations section before using AutoML assistant. A license is required to use AutoML Assistant.

Plugin Custom Knowledge base

This feature empowers users to integrate their own proprietary knowledge into AutoML Assistant for more context-aware and organization-specific responses, allowing the Assistant to reason over private data securely.

This application uses a self-hosted Weaviatearrow-up-right (License Informationarrow-up-right) as part of the extended RAG architecture. The feature is under active development and can be extended to other types of semantic/graph database (i.e. Neo4j).

If you want to connect additional knowledge base to this application, please contact [email protected]envelope.

Figure 1. Custom knowledge base specification

Execution Plans

Before running any long-running task—such as model training, statistical analysis, or survival analysis—AutoML Assistant generates a step-by-step Execution Plan. This preview lets you review exactly what the Assistant will do, giving you full transparency and control before anything is executed.

Once approved, the task is executed and the plan is automatically saved to the Execution Plans sidebar as part of your Conversation Assets, where it can be revisited at any time. This helps you clearly understand the workflow, track executed steps, and maintain a reliable record for reproducibility and audit purposes.

Figure 2. The Execution Plans sidebar is in the left of your conversation

Conversation Artifacts

Another part of the Conversation Assets is the Artifact section. The Artifacts sidebar automatically captures and displays all key outputs generated during your session. This includes datasets, cohorts, dataframes, trained models, charts, transformation scripts, and other reusable objects created throughout the workflow. The Artifacts sidebar provides a centralized, organized view of your session’s progress, ensuring transparency, reproducibility, and easy reuse of all ML assets.

Figure 3. Key outputs generated during your session will be populated in the Artifacts sidebar

AutoML Assistant supports direct referencing of artifacts in your conversation: simply type the “@” symbol to trigger a dropdown list of available artifacts and mention them in your prompt. This makes it easy to build on previous work, compare results, or refine your analysis in a seamless and interactive way.

Figure 4. Type “@” to mention a specific artifact in your prompt

You can click on any listed artifact to open its detailed view, allowing you to inspect, interpret, and reuse results seamlessly without interrupting the ongoing conversation. Artifacts are organized into three groups: Data Artifacts, Logic Artifacts, and Visualization Artifacts. Below are the different artifact types accessible from the Artifacts sidebar.

Data Artifacts

Data Artifacts represent the core data objects created and transformed throughout a conversation. They serve as both the foundation and the outcomes of various analytical and machine learning tasks. These artifacts can be directly inspected, reused, or passed into Logic Artifacts to drive further analysis or model training.

Dataset Artifacts

When a conversation is created, the underlying Dataset used to create the cohorts is listed as an artifact in the Artifacts sidebar. This artifact serves as the foundation for cohort definition and downstream analysis.

By clicking on the dataset artifact, users can view important In-app information such as the artifact ID, and Object information of the Dataset on DNAnexus platform like Dataset ID and Dataset name. This ensures full traceability of data sources and allows users to reference or reuse the dataset in other workflows directly within the conversation.

Figure 5. Details of a Dataset artifact

Cohort Artifacts

Any cohort selected to start a conversation with AutoML Assistant is automatically shown as a Cohort artifact in the Artifacts sidebar. This artifact provides two key sections of information:

  • The In-app information includes the artifact ID and a code snippet that users can use to load the cohort data directly within the ML JupyterLab environment for further analysis or modeling.

  • The Object information section offers detailed metadata from the platform, such as the Cohort ID, Cohort name, and the SQL query used to define the cohort.

This ensures transparency, traceability, and ease of reuse of the Cohorts across different analysis workflows.

Figure 6. Details of a Cohort artifact

Dataframe artifacts

A Dataframe artifact represents a structured dataset generated during the conversation. It can serve as either the input or output of a Transformation Artifact or input of an MLPipeline Artifact, depending on the workflow stage.

  • Format & Usability: Users can load a Dataframe Artifact directly into their analysis environment as either a Daft DataFrame or a Pandas DataFrame, making it flexible for both large-scale and lightweight data operations.

  • Role in Workflow:

  • As an input, it provides a clean and structured dataset for further transformations, feature engineering, or ML pipeline execution.

  • As an output, it captures the transformed or engineered dataset, ready for downstream tasks such as visualization, statistical testing, or model training.

  • Persistence & Reuse: Once created, a Dataframe Artifact can be reopened, inspected, and reused in future steps of the conversation without needing to recompute the transformations.

  • You can click on the ‘Export Notebook’ button to export the notebook and data required to reproduce this artifact in your ML JupyterLab notebook environment.

Figure 7. Details of a Dataframe artifact

Model Artifacts

After AutoML Assistant successfully builds a machine learning model, a Model artifact is automatically generated and displayed in the Artifacts sidebar. This artifact represents the trained model along with its configuration details.

By clicking the model artifact, users can view information such as the artifact ID and a ready-to-use code snippet for loading and applying the model directly within the ML JupyterLab environment. When loaded, the artifact becomes a scikit-learn–compatible model object, making it easy to integrate into downstream workflows for evaluation, prediction, or deployment.

To view the model in the MLflow Tracking Server, click the MLflow button in the bottom-right corner.

You can click on the ‘Export Notebook’ button to export the notebook and data required to reproduce this artifact in your ML JupyterLab notebook environment.

Figure 8. Details of a Model artifact

Logic Artifacts

Logic Artifacts represent the AI-generated Python scripts that handle the analytical and machine learning logic within a conversation. Unlike Data Artifacts, which store and represent data, Logic Artifacts define the operations performed on that data. You can directly use these scripts for refinements and validation in the JupyterLab notebook environment. They are central to enabling reproducibility, automation, and transparency in the workflow, as each artifact captures the exact code used to produce results.

Transformation artifacts

A Transformation artifact represents an AI-generated Python script designed to perform analytical and preprocessing tasks within the ML workflow. It acts as a flexible bridge between raw or curated data and downstream analysis or modeling.

Figure 9. Details of a Transformation artifact

  • Inputs: One or multiple artifacts can serve as inputs, including Dataset, Cohort, or Dataframe artifacts. These inputs provide the data on which the transformation script operates.

  • Outputs: A transformation can generate a new Dataframe Artifact (reflecting the transformed data), a new Model Artifact, or, in cases where the transformation is purely diagnostic or exploratory, may produce no output artifact but only logs or plain text summaries.

  • Role in Workflow: Transformation artifacts help bridge exploratory analysis with modeling by producing clean, feature-ready datasets or by highlighting insights that inform subsequent steps such as AutoML or custom model training.

You can also click on the ‘Export Notebook’ button to export the notebook and data required to reproduce this artifact in your ML JupyterLab notebook environment.

Visualization Artifacts

Charts Artifact

The Chart artifact stores all visual outputs generated during your conversation—such as feature importance plots, cohort comparison charts, and model performance visualizations. Each chart is automatically saved to the Artifacts sidebar, allowing you to revisit, download, or reuse these visuals in reports, presentations, or downstream analysis.

You can also click on the ‘Export Notebook’ button to export the notebook and data required to reproduce this artifact in your ML JupyterLab notebook environment.

This ensures that every insight produced by AutoML Assistant is preserved, traceable, and easy to reference.

Figure 10. Details of a Chart artifact

Last updated

Was this helpful?