Introduction to AutoML Assistant

Please refer to the Usage and Limitations section before using AutoML assistant. A license is required to use AutoML Assistant.

What is AutoML Assistant?

AutoML Assistant is a new component of the AI/ML Accelerator package, implemented as a feature within ML JupyterLab. It is a conversational (GenAI-driven) tool that automates the end-to-end process of building, deploying, and optimizing machine learning models for faster generation of actionable insights via guided support and intelligent recommendations.

AutoML Assistant leverages customer-provided LLM to help users to create and refine prediction models from Apollo cohorts, a common data object on DNAnexus platform, which are created using DNAnexus Cohort Browser.

Target Audience/Users

AutoML Assistant is designed to support a diverse user base, ranging from translational scientists with limited coding experience to data scientists familiar with machine learning workflows. It empowers users at all levels to explore data, optimize models, and uncover actionable findings efficiently. By offering a conversational, GenAI-driven interface, AutoML Assistant lowers the barrier to entry, enabling even non-experts to build robust predictive models with ease. At the same time, experienced users can use it to streamline model development, fine-tune predictions, and efficiently explore actionable insights from the Apollo dataset.

Goal

The primary purpose of AutoML Assistant is to streamline the process of building and iterating on machine learning models, allowing users to work more efficiently without relying on extensive coding. Through natural-language interaction, users can guide the AutoML Assistant chatbot to perform tasks such as feature preparation, model training, and basic optimization. Once a model is created, the Assistant provides insights like feature importance rankings and reusable Python artifacts (e.g. ML model, Python notebook, dataframe, charts, etc.). These outputs serve as starting points for further scientific analysis—supporting users in investigating patterns or potential signals in their data.

Roles of Large Language Models (LLM)

The integrated LLM serves as the intelligence layer that makes AutoML Assistant an effective tool for building and refining machine learning models. It interprets the structure and metadata of Apollo cohorts, helps users understand their data, and guides them through each stage of the ML development process. Whether suggesting relevant features, advising on preprocessing strategies, or helping iterate on modeling approaches, the LLM acts as a knowledgeable assistant that simplifies technical workflows. This allows users to focus on analytical questions and model-building goals, while the assistant handles much of the complexity behind the scenes.

Why use the AutoML Assistant?

Core values of AutoML Assistant:

  1. Democratize AI/ML and Foster Collaboration: Natural language interface and no-code workflows empower clinicians and translational researchers, while developers can reuse artifacts and accelerate prototyping.

  2. Streamline Cohort Analysis & Model Development: Automated data prep and statistical comparisons make it easy to explore cohorts and extract meaningful patterns.

  3. Ensure Security, Transparency, & Reproducibility: Traceability with reusable Python Artifacts and visual outputs ensures insights are explainable, shareable, and easy to replicate.

  4. Accelerate Scientific Discovery & Decision Making: AutoML reduces model development time from weeks to hours, enabling faster, data-driven decisions.

  5. Provide Reusable Results: Empower developers to start at a higher starting point with AI-generated Artifacts for validation and refinement.

Core features of AutoML Assistant

For all users, AutoML Assistant provides:

  • Conversational, GenAI-Powered Guidance: Enables users to perform cohort analysis, feature engineering, model training, and interpretation—all through natural language. Example: “Use two cohorts, Responders and Non-responders, to build a predictive model that classifies whether a cancer patient will respond to the 5-FU treatment."

  • Context-Aware Integration with DNAnexus Apollo: Intelligently interprets data structures from Apollo Cohorts—created via Omics Data Assistant (ODA) or Cohort Browser—bridging the gap from data selection to insight discovery.

  • Automated ML Pipeline Execution: Builds, trains, and evaluates models behind the scenes with common ML libraries like PyCaret, PyTorch, Scikit-learn, etc. minimizing manual steps while maximizing reproducibility.

  • Statistical Analysis & Visual Summarization: Performs key statistical comparisons and delivers clear, interactive visualizations for transparent, data-driven insights.

For Data Scientists and Developers, AutoML Assistant provides:

  • Reusable Python Artifacts: Automatically generates dataframes, models, and Python scripts—ensuring results are reproducible, traceable, and ready for further exploration in ML JupyterLab.

Last updated

Was this helpful?