HEAVY.AI Docs
v8.1.0
v8.1.0
  • Welcome to HEAVY.AI Documentation
  • Overview
    • Overview
    • Release Notes
  • Installation and Configuration
    • System Requirements
      • Hardware Reference
      • Software Requirements
      • Licensing
    • Installation
      • Free Version
      • Installing on Docker
        • HEAVY.AI Installation using Docker on Ubuntu
      • Installing on Ubuntu
        • HEAVY.AI Installation on Ubuntu
        • Install NVIDIA Drivers and Vulkan on Ubuntu
      • Installing on Rocky Linux / RHEL
        • HEAVY.AI Installation on RHEL
        • Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL
      • Getting Started on AWS
      • Getting Started on GCP
      • Getting Started on Azure
      • Getting Started on Kubernetes (BETA)
      • Upgrading
        • Upgrading HEAVY.AI
        • Upgrading from Omnisci to HEAVY.AI 6.0
        • CUDA Compatibility Drivers
      • Uninstalling
      • Ports
    • Services and Utilities
      • Using Services
      • Using Utilities
    • Executor Resource Manager
    • Configuration Parameters
      • Overview
      • Configuration Parameters for HeavyDB
      • Configuration Parameters for HEAVY.AI Web Server
      • Configuration Parameters for HeavyIQ
    • Security
      • Roles and Privileges
        • Column-Level Security
      • Connecting Using SAML
      • Implementing a Secure Binary Interface
      • Encrypted Credentials in Custom Applications
      • LDAP Integration
    • Distributed Configuration
  • Loading and Exporting Data
    • Supported Data Sources
      • Kafka
      • Using HeavyImmerse Data Manager
      • Importing Geospatial Data
    • Command Line
      • Loading Data with SQL
      • Exporting Data
  • SQL
    • Data Definition (DDL)
      • Datatypes
      • Users and Databases
      • Tables
      • System Tables
      • Views
      • Policies
      • Comment
    • Data Manipulation (DML)
      • SQL Capabilities
        • ALTER SESSION SET
        • ALTER SYSTEM CLEAR
        • DELETE
        • EXPLAIN
        • INSERT
        • KILL QUERY
        • LIKELY/UNLIKELY
        • SELECT
        • SHOW
        • UPDATE
        • Arrays
        • Logical Operators and Conditional and Subquery Expressions
        • Table Expression and Join Support
        • Type Casts
      • Geospatial Capabilities
        • Uber H3 Hexagonal Modeling
      • Functions and Operators
      • System Table Functions
        • generate_random_strings
        • generate_series
        • tf_compute_dwell_times
        • tf_feature_self_similarity
        • tf_feature_similarity
        • tf_geo_rasterize
        • tf_geo_rasterize_slope
        • tf_graph_shortest_path
        • tf_graph_shortest_paths_distances
        • tf_load_point_cloud
        • tf_mandelbrot*
        • tf_point_cloud_metadata
        • tf_raster_contour_lines; tf_raster_contour_polygons
        • tf_raster_graph_shortest_slope_weighted_path
        • tf_rf_prop_max_signal (Directional Antennas)
        • ts_rf_prop_max_signal (Isotropic Antennas)
        • tf_rf_prop
      • Window Functions
      • Reserved Words
      • SQL Extensions
      • HeavyIQ LLM_TRANSFORM
  • HeavyImmerse
    • Introduction to HeavyImmerse
    • Admin Portal
    • Control Panel
    • Working with Dashboards
      • Dashboard List
      • Creating a Dashboard
      • Configuring a Dashboard
      • Duplicating and Sharing Dashboards
    • Measures and Dimensions
    • Using Parameters
    • Using Filters
    • Using Cross-link
    • Chart Animation
    • Multilayer Charts
    • SQL Editor
    • Customization
    • Joins (Beta)
    • Chart Types
      • Overview
      • Bubble
      • Choropleth
      • Combo
      • Contour
      • Cross-Section
      • Gauge
      • Geo Heatmap
      • Heatmap
      • Linemap
      • Number
      • Pie
      • Pointmap
      • Scatter Plot
      • Skew-T
      • Table
      • Text Widget
      • Wind Barb
    • Deprecated Charts
      • Bar
      • Combo - Original
      • Histogram
      • Line
      • Stacked Bar
    • HeavyIQ SQL Notebook
  • HEAVYIQ Conversational Analytics
    • HeavyIQ Overview
      • HeavyIQ Guidance
  • HeavyRF
    • Introduction to HeavyRF
    • Getting Started
    • HeavyRF Table Functions
  • HeavyConnect
    • HeavyConnect Release Overview
    • Getting Started
    • Best Practices
    • Examples
    • Command Reference
    • Parquet Data Wrapper Reference
    • ODBC Data Wrapper Reference
    • Raster Data Wrapper Reference
  • HeavyML (BETA)
    • HeavyML Overview
    • Clustering Algorithms
    • Regression Algorithms
      • Linear Regression
      • Random Forest Regression
      • Decision Tree Regression
      • Gradient Boosting Tree Regression
    • Principal Components Analysis
  • Python / Data Science
    • Data Science Foundation
    • JupyterLab Installation and Configuration
    • Using HEAVY.AI with JupyterLab
    • Python User-Defined Functions (UDFs) with the Remote Backend Compiler (RBC)
      • Installation
      • Registering and Using a Function
      • User-Defined Table Functions
      • RBC UDF/UDTF Example Notebooks
      • General UDF/UDTF Tutorial Notebooks
      • RBC API Reference
    • Ibis
    • Interactive Data Exploration with Altair
    • Additional Examples
      • Forecasting with HEAVY.AI and Prophet
  • APIs and Interfaces
    • Overview
    • heavysql
    • Thrift
    • JDBC
    • ODBC
    • Vega
      • Vega Tutorials
        • Vega at a Glance
        • Getting Started with Vega
        • Getting More from Your Data
        • Creating More Advanced Charts
        • Using Polys Marks Type
        • Vega Accumulator
        • Using Transform Aggregation
        • Improving Rendering with SQL Extensions
      • Vega Reference Overview
        • data Property
        • projections Property
        • scales Property
        • marks Property
      • Migration
        • Migrating Vega Code to Dynamic Poly Rendering
      • Try Vega
    • RJDBC
    • SQuirreL SQL
    • heavyai-connector
  • Tutorials and Demos
    • Loading Data
    • Using Heavy Immerse
    • Hello World
    • Creating a Kafka Streaming Application
    • Getting Started with Open Source
    • Try Vega
  • Troubleshooting and Special Topics
    • FAQs
    • Troubleshooting
    • Vulkan Renderer
    • Optimizing
    • Known Issues and Limitations
    • Logs and Monitoring
    • Archived Release Notes
      • Release 6.x
      • Release 5.x
      • Release 4.x
      • Release 3.x
Powered by GitBook
On this page
  • General Syntax
  • Example
  • Making Predictions
Export as PDF
  1. HeavyML (BETA)
  2. Regression Algorithms

Decision Tree Regression

A simple, interpretable model

In general, we recommend starting with either Linear or Random Forests Regressions depending on data complexity. A single decision tree regression option is available for narrower use cases. One advantage of decision tree regression over random forest regression is its interpretability. A decision tree is an easy-to-understand model that can be easily visualized and interpreted. The tree structure allows one to see the sequence of decisions and criteria used to make predictions. This interpretability can be especially valuable in applications where the decision-making process needs to be transparent and understandable, such as in healthcare or finance.

Another advantage of decision tree regression is that it is computationally efficient compared to random forest regression. Because a decision tree is a single tree, it can be built and trained more quickly than an ensemble of trees in a random forest model. This efficiency can be important when dealing with real-time applications or large data sets where computational resources are limited.

Additionally, decision tree regression can be more appropriate for certain types of data sets or problems. For example, if the target variable has a strong linear relationship with a small subset of the features, a decision tree model might be able to capture this relationship more accurately than a random forest model. In such a case, a random forest model might overfit the data and produce less accurate predictions.

It is important to note, however, that decision tree regression models can also suffer from overfitting and may not perform as well as random forest models in many scenarios.

General Syntax

The model syntax follows general HeavyML conventions. You need to specify a SQL-legal model name, specify the type DECISION_TREE_REG, and provide a SELECT statement indicating the columns to use. The column projection statement must have the predicted variable first, then categorical columns and then continuous ones. Optionally, you can add a WITH statement to adjust categorical column processing using the same CAT_TOP_K and EVAL_FRACTION parameters discussed above.

Example

CREATE OR REPLACE MODEL fl_parcel_price_dt 
OF TYPE DECISION_TREE_REG AS 
SELECT
  SALEPRC1,
  PARUSEDESC,
  CNTYNAME,
  ACRES,
  TOTLVGAREA,
  EFFYRBLT,
  SALEYR1
FROM 
  fl_res_parcels_2018
WITH 
(CAT_TOP_K=70, EVAL_FRACTION=0.2)

This model trains in approximately 240ms. This is significantly faster than a full random forest regression on the same hardware which took several seconds. It produces a single tree model with r2 of 85% which is both useful and simple. While you would not expect this model to be as robust as random forests for application outside of its initial training domain, it might be acceptable or even preferred in terms of speed and explicability within its training domain.

Making Predictions

To make a prediction using your decision tree model, use the ML_PREDICT function. This can be used on-the-fly, or as part of an UPDATE or CTAS if you want to persist your predictions. The general form is:

ML_PREDICT('<model_name>', <predictor_1>, <predictor_2> ... <predictor_n>)

So in our example above, you'd use:

ML_PREDICT('fl_parcel_price_dt', PARUSEDESC, CNTYNAME, ACRES, TOTLVGAREA, EFFYRBLT, SALEYR1)

Note that we did not use the name of the variable to be predicted - that comes from the model itself.

PreviousRandom Forest RegressionNextGradient Boosting Tree Regression