HEAVY.AI Docs
v9.0.0 (Latest)
v9.0.0 (Latest)
  • Welcome to HEAVY.AI Documentation
  • Overview
    • Overview
    • Release Notes
  • Installation and Configuration
    • System Requirements
      • Hardware Reference
      • Software Requirements
      • Licensing
    • Installation
      • Free Version
      • Installing on Docker
        • HEAVY.AI Installation using Docker on Ubuntu
      • Installing on Ubuntu
        • HEAVY.AI Installation on Ubuntu
        • Install NVIDIA Drivers and Vulkan on Ubuntu
      • Installing on Rocky Linux / RHEL
        • HEAVY.AI Installation on RHEL
        • Install NVIDIA Drivers and Vulkan on Rocky Linux and RHEL
      • Getting Started on AWS
      • Getting Started on GCP
      • Getting Started on Azure
      • Getting Started on Kubernetes (BETA)
      • Upgrading
        • Upgrading HEAVY.AI
        • Upgrading from Omnisci to HEAVY.AI 6.0
        • CUDA Compatibility Drivers
      • Uninstalling
      • Ports
    • Services and Utilities
      • Using Services
      • Using Utilities
    • Executor Resource Manager
    • Configuration Parameters
      • Overview
      • Configuration Parameters for HeavyDB
      • Configuration Parameters for HEAVY.AI Web Server
      • Configuration Parameters for HeavyIQ
    • Security
      • Roles and Privileges
        • Column-Level Security
      • Connecting Using SAML
      • Implementing a Secure Binary Interface
      • Encrypted Credentials in Custom Applications
      • LDAP Integration
    • Distributed Configuration
  • Loading and Exporting Data
    • Supported Data Sources
      • Kafka
      • Using HeavyImmerse Data Manager
      • Importing Geospatial Data
    • Command Line
      • Loading Data with SQL
      • Exporting Data
  • SQL
    • Data Definition (DDL)
      • Datatypes
      • Users and Databases
      • Tables
      • System Tables
      • Views
      • Policies
      • Comment
    • Data Manipulation (DML)
      • SQL Capabilities
        • ALTER SESSION SET
        • ALTER SYSTEM CLEAR
        • DELETE
        • EXPLAIN
        • INSERT
        • KILL QUERY
        • LIKELY/UNLIKELY
        • SELECT
        • SHOW
        • UPDATE
        • Arrays
        • Logical Operators and Conditional and Subquery Expressions
        • Table Expression and Join Support
        • Type Casts
      • Geospatial Capabilities
        • Uber H3 Hexagonal Modeling
      • Functions and Operators
      • System Table Functions
        • generate_random_strings
        • generate_series
        • tf_compute_dwell_times
        • tf_feature_self_similarity
        • tf_feature_similarity
        • tf_geo_rasterize
        • tf_geo_rasterize_slope
        • tf_graph_shortest_path
        • tf_graph_shortest_paths_distances
        • tf_load_point_cloud
        • tf_mandelbrot*
        • tf_point_cloud_metadata
        • tf_raster_contour_lines; tf_raster_contour_polygons
        • tf_raster_graph_shortest_slope_weighted_path
        • tf_rf_prop_max_signal (Directional Antennas)
        • ts_rf_prop_max_signal (Isotropic Antennas)
        • tf_rf_prop
      • Window Functions
      • Reserved Words
      • SQL Extensions
      • HeavyIQ LLM_TRANSFORM
  • HeavyImmerse
    • Introduction to HeavyImmerse
    • Admin Portal
    • Control Panel
    • Working with Dashboards
      • Dashboard List
      • Creating a Dashboard
      • Configuring a Dashboard
      • Duplicating and Sharing Dashboards
    • Measures and Dimensions
    • Using Parameters
    • Using Filters
    • Using Cross-link
    • Color Consistency
    • Chart Animation
    • Multilayer Charts
    • SQL Editor
    • Customization
    • Joins (Beta)
    • Chart Types
      • Overview
      • Box/Whisker and Violin Plots
      • Bubble
      • Choropleth
      • Combo
      • Contour
      • Cross-Section
      • Gauge
      • Geo Heatmap
      • Heatmap
      • Linemap
      • Number
      • Pie
      • Pointmap
      • Scatter Plot
      • Skew-T
      • Table
      • Text Widget
      • Wind Barb
    • Deprecated Charts
      • Bar
      • Combo - Original
      • Histogram
      • Line
      • Stacked Bar
    • HeavyIQ SQL Notebook
  • HEAVYIQ Conversational Analytics
    • HeavyIQ Overview
      • HeavyIQ Guidance
    • HeavyIQ Model Overview (HeavyLM)
  • HeavyRF
    • Introduction to HeavyRF
    • Getting Started
    • HeavyRF Table Functions
  • HeavyConnect
    • HeavyConnect Release Overview
    • Getting Started
    • Best Practices
    • Examples
    • Command Reference
    • Parquet Data Wrapper Reference
    • ODBC Data Wrapper Reference
    • Raster Data Wrapper Reference
  • HeavyML (BETA)
    • HeavyML Overview
    • Clustering Algorithms
    • Regression Algorithms
      • Linear Regression
      • Random Forest Regression
      • Decision Tree Regression
      • Gradient Boosting Tree Regression
    • Principal Components Analysis
  • Python / Data Science
    • Data Science Foundation
    • JupyterLab Installation and Configuration
    • Using HEAVY.AI with JupyterLab
    • Python User-Defined Functions (UDFs) with the Remote Backend Compiler (RBC)
      • Installation
      • Registering and Using a Function
      • User-Defined Table Functions
      • RBC UDF/UDTF Example Notebooks
      • General UDF/UDTF Tutorial Notebooks
      • RBC API Reference
    • Ibis
    • Interactive Data Exploration with Altair
    • Additional Examples
      • Forecasting with HEAVY.AI and Prophet
  • APIs and Interfaces
    • Overview
    • heavysql
    • Thrift
    • JDBC
    • ODBC
    • Vega
      • Vega Tutorials
        • Vega at a Glance
        • Getting Started with Vega
        • Getting More from Your Data
        • Creating More Advanced Charts
        • Using Polys Marks Type
        • Vega Accumulator
        • Using Transform Aggregation
        • Improving Rendering with SQL Extensions
      • Vega Reference Overview
        • data Property
        • projections Property
        • scales Property
        • marks Property
      • Migration
        • Migrating Vega Code to Dynamic Poly Rendering
      • Try Vega
    • RJDBC
    • SQuirreL SQL
    • heavyai-connector
  • Tutorials and Demos
    • Loading Data
    • Using Heavy Immerse
    • Hello World
    • Creating a Kafka Streaming Application
    • Getting Started with Open Source
    • Try Vega
  • Troubleshooting and Special Topics
    • FAQs
    • Troubleshooting
    • Vulkan Renderer
    • Optimizing
    • Known Issues and Limitations
    • Logs and Monitoring
    • Archived Release Notes
      • Release 7.x
      • Release 6.x
      • Release 5.x
      • Release 4.x
      • Release 3.x
Powered by GitBook
On this page
  • Box Plot vs Violin Plot
  • Scaling
  • Outliers
  • Color Palette
  • Data and Formatting Settings
  • Example 1: Understanding Box Plots
  • Example 2: Understanding Violin Plots
  • Example 3: Exploring Outliers
Export as PDF
  1. HeavyImmerse
  2. Chart Types

Box/Whisker and Violin Plots

PreviousOverviewNextBubble

Last updated 11 days ago

This chart contains two related charts types, Box and Whisker plots and Violin plots. Box and Whisker plots are a statistical chart used to show the five-number summary of a dataset: the minimum, first quartile, median, third quartile, and maximum. Violin plots are used to visualize the distribution of quantitative data, using density curves; they also include a box plot embedded within, showing the data density in relation to the statistics highlighted by box plots.

The chart requires one categorical dimension for grouping the data, and one numeric measure whose distribution will be displayed within each group.

Features
Quantity
Notes

1

Must be categorical; Dimension 1 = X-Axis

1

Must be quantitative; Measure 1 = Y-Axis

Use a Box Plot chart for getting a clear picture of how your data is distributed within different categories, and for comparing those distributions. They help you see where most of your data lies, how consistent or varied it is, and if there are any unusual data points.

Box Plot vs Violin Plot

You can alternate between the two plot types under the "Graphical Settings" section of the Settings panel on the right side of the chart editor.

Scaling

By default, Box and Violin plots will auto-scale to highlight the core spread of your data and prevent extreme outliers from compressing the view. This ensures that the box and violin shapes, which represent the majority of your data points, remain clear and easy to interpret, even when very large or small outliers are present.

Outliers

Outlier points can be toggled on by clicking the "Show outliers" option in the Settings panel on the right. If toggled on, the chart will re-scale to show the full data range.

Not all outliers will be shown on the chart - only the 10 highest and 10 lowest outliers per group will be plotted.

Due to high query complexity, outlier points are disabled by default. This can be changed by setting the ui/enable_box_plot_outliers_default feature flag to ON.

Color Palette

By default, Box Plot charts will automatically color by the selected categorical dimension. By selecting the palette picker, you can change the categorical palette or choose a solid color, should you not want categorical coloring. Box Plots support color mappings.

Data and Formatting Settings

Sorting

By default, Box Plots will sort by # Records, so that the group with the most values is displayed first. Other sorting options can be chosen from the drop-down menu.

# of Groups

This controls how many groups from the selected categorical dimension will be displayed. The default is set to 50.

Violin Distribution Precision

The violin density curves are drawn by taking a set number of bins and dividing them along the current chart's y-axis scale. This setting controls the number of bins used to draw the violins. The current default is set to 60, and the max is set to 200.

If you find the violin curves appear jagged or imprecise, increasing this setting will improve their visual accuracy, although it will slightly increase query complexity.

Center Line

Choose whether the line within the box plot represents the median or the mean.

Formatting Settings

Example 1: Understanding Box Plots

In our first example, we explore a professional basketball dataset. In the chart below, we've created a Box Plot from a NBA games dataset, choosing home_team as our dimension and road_team_final_score as our measure. This setup allows us to analyze how many points each home team typically gives up to their opponents.

Looking at the chart, you can quickly see how powerful Box Plots are for making inferences. It's immediately obvious which teams are strong at limiting opponent points when playing at home, and which are not. Cleveland is clearly one of the best defensive teams: you can see they have the lowest first quartile (the bottom of the box) and the lowest median (the center line in the box) compared to any other team shown.

Box plots also reveal inconsistency. Consider New Orleans next to Brooklyn. While both teams have similar medians and third quartiles, New Orleans' box plot has much longer whiskers in both directions. This clearly indicates a greater variance in the number of points they give up while playing at home, showing they are a less consistent defensive team than Brooklyn.

Example 2: Understanding Violin Plots

Turn on the Violin Plot to get an even deeper look at this data. The violin shapes reveal the density and clustering within our data—in other words, how many points each home team commonly gives up to opponents.

Looking at Cleveland again, it's now even more obvious that they're a strong defensive team. The violin plot clearly shows significant clusters of opponent scores around and below the 100-point mark, indicating their consistent ability to limit scoring. Conversely, bad defensive teams like San Antonio have the majority of their opponents score clustering above 110 points.

You can also clearly visualize consistency. Consider Miami: their violin plot shows virtually all of their opponents' scores clustered consistently between 90 and 120 points.

Being able to clearly visualize common occurrences in our data gives us critical insight into the underlying patterns and typical behaviors, helping us understand not just the full range of outcomes, but precisely where the data truly concentrates and the varying likelihood of different values.

Example 3: Exploring Outliers

This example shows how outliers affect our Box Plots and allow us to factor in central tendencies with large data ranges. For this example, we'll explore a dataset on airline flights. Our dimension will be dest_state (destination state) and our measure will be airtime, showing the destinations with the longest flight times.

By default, with outliers turned off, our chart automatically scales to focus on the core distribution of your data, ensuring that the box plots remain clearly visible and easy to interpret. As you can see, Hawaii, to no surprise, has by far the highest flight times among the states shown.

But how does this view change if we turn on outliers?

When we enable outliers, you'll immediately notice how the chart's scale adjusts to fully include these extreme values.

For efficiency, Immerse only displays the 10 highest and 10 lowest outliers for each group, if applicable.

Observe Washington; which previously showed higher-than-normal flight times, now clearly displays many extreme outliers. This likely indicates a significant number of overseas flights to this state, pulling the overall range much wider.

In contrast, some states have few or no outliers, showing that their flight times are much more consistently clustered around the main body of the data, with very few unusually long flights for those destinations.

Outliers are invaluable for identifying unusual or extreme data points that fall far outside the typical range, providing a complete picture of your data's spread.

Displaying outliers can impact performance - we recommend enabling outliers only when you specifically need to investigate these extreme values; otherwise, keeping them off will ensure faster loading and a focused view of your core data.

Required

Required

You can use custom measure formats for the values in your chart. See .

Customizing Measure and Date Formats
Dimensions
Measures
Violin Plot
Cleveland: Lowest scores against at Home
Brooklyn vs New Orleans points variance
Clustering of points scored against teams
Hawaii: Longest Flight Times
Easily discernable outliers