Welcome to HEAVY.AI Documentation
Last updated
Last updated
Use of HEAVY.AI is subject to the terms of the HEAVY.AI End User License Agreement (EULA) and Addendum (EULA).
Learn how to use Immerse to gain new insights to your data with fast, responsive graphics and SQL queries.
Learn how to Install and configure your HEAVY.AI instance, then load data for analysis.
Learn how to extend HEAVY.AI with an integrated data science foundation and custom charts and interfaces. Contribute to the HEAVY.AI Core Open Source project.
For more complete release information, see the Release Notes.
We are pleased to introduce HeavyIQ, a custom LLM embedded within a brand new visual notebook interface. This combination of custom model and user experience represents our vision for the future of analytics. It supports the capabilities you’d expect, including English to SQL, English to SQL-backed answer and English to graphics. We think you will be very pleased with the “out of the box” results.
While HeavyIQ is certainly the headline, there are as always a number of additional features in this release. One not yet fully apparent to a casual user is support for table and column level metadata. This is available at 8.0 in SQL, and at release will already be used by HeavyIQ to help in table and column selection. In cases where table or column names are ambiguous, we’ve found that simplifying adding a clarifying metadata comment is a simple way to improve HeavyIQ accuracy.
At 8.0, we’ve also significantly improved our support for raster and multidimensional array datasets. Since most raster data is available on huge external data stores, we’ve added raster to HeavyConnect. Now rather than to import these datasets, you have the option to link to them on-the-fly as needed. We’ve also changed the internal storage of rasters to use a tile-oriented approach aligned with fragments. This lowers memory requirements and improves performance by allowing fragment skipping. What we’ve not changed is our unified syntax for raster and vector processing. That continues to make use of raster data significantly easier than on systems with entirely different internal languages for raster and vector data processing.
Finally, this release includes major dependency updates and a more flexible license management system. The dependency updates should be transparent to most users, but are an important part of maintaining system security. The new licensing system deliberately mirrors those of our peers, now supporting “floating” as well as “node locked” licenses. As more of our customers deploy in the cloud, these new capabilities support more flexibility in resource management.
We hope you enjoy this major new release, and look forward to seeing how you put these new capabilities to expand the power and accessibility of visual analytics within your organizations.
We are also pleased to announce the general availability of our new backend Executor Resource Manager with CPU / GPU parallelism and query policy controls such as executor type, memory and time limits. We can also now support CPU queries larger than available CPU memory.
This release also features the debut of a user interface for joins in Immerse (beta), supporting inner and left joins which are named and persisted in dashboards. This provides analytic and visualization access to joined columns, complementing the prior table linking function supporting cross-filtering.
Powerful machine learning (beta) and statistical methods (beta) are now available in the database, supporting high performance predictive analytics workflows. For example you can now perform clustering or run linear regression or random forest models on large datasets with interactive inferencing.
Immerse also gains a large set of dashboard refinements, including an optional ‘minimalist’ style with hidden chart titles, and an optional new text chart with full HTML and font controls.
There are several major external dependency updates in this release. With Ubuntu 18 reaching its end of life we now require Ubuntu 20.04. For similar reasons, we now support NVIDIA CUDA version 11.8, which deprecates support for Kepler GPUs. Last but not least, we are formally retiring polygon ‘render groups’ within the database, a change which is not backwards compatible. So full database backups are required as part of this upgrade.
New Features and Improvements
BETA: Joins in Immerse
BETA: Enhanced text chart. The flag `ui/enable_new_text_chart` adds a “text2” chart type, with additional features:
font family (e.g. arial)
font sizes, line height
colors populated from dashboard palette
html table
undo/redo
separator line with styles
full html support
Added a new “minimal” style mode in which chart titles are hidden by default but appear on rollover. Controlled by feature flag `ui/minimize_chart_size` which defaults to off
Within map chart editor geo layers are now renamable
Role-based access to control panel UX previously requiring admin access.
7.0 marks the beta release of HeavyML, a new set of capabilities to execute accelerated machine learning workflows directly from SQL.
General Capabilities and Methods
Named model creation is supported via a new CREATE MODEL statement (see the release notes and documentation for more details)
Row-wise inference (GPU-accelerated for GPU queries) can be performed via a new ML_PREDICT row-wise operator. This can be used as an Immerse custom measure and persisted into dashboards, allowing end-users to consume models without needing to know how to create or administer them.
An EVALUATE model function is provided to test models against metrics (such as r2).
Table functions are provided to access linear regression coefficients for linear regression models and variable importance scores for random forest models.
A new “SHOW MODELS” SQL command allows end users to determine which models are available.
More-detailed model metadata can be accessed by admins with SHOW MODEL DETAILS and in a new ml_models system table in the information_schema database.
Regression Algorithms
Four regression algorithms are supported initially: linear regression, random forest regression, decision trees, and Gradient Boosted Trees (GBT).
Both categorical text and continuous numeric regressors/predictors are supported. Categorical inputs are automatically one-hot-encoded.
Support for continuous variable prediction is initially supported, categorical classification is planned for a later release.
Clustering Algorithms
Two clustering algorithms are supported in this initial release: KMeans and DBScan.
Clustering algorithms can be called via associated table functions (more detail can be found in the relevant documentation), and currently support continuous numeric inputs only.
Performance and Administration
A new Executor Resource Manager (ERM) framework is provided
The ERM allows for CPU queries to run fully in parallel, and one or more CPU queries to run in parallel while a GPU query is executing (parallel GPU query kernel execution is not supported yet).
It also allows execution of CPU queries where the input datasets do not fit into the CPU buffer pool by executing on a fragment-by-fragment basis, paging from storage.
The Executor Resource Manager takes into account the resources needed for each query to schedule them in the most efficient manner.
It is defaulted on, however it can be turned off using the following flag: --enable-executor-resource-mgr=0, which will lead query kernel execution to follow the same serial, pre-7.0 path.
New Features and Improvements
A new “cell editor” is provided. This supports multi-band antennas mounted within various sites within a cell. Various antenna attributes such as horizontal and vertical falloff can be easily applied based on an extensible library of antenna types.
Vegetation and building envelope attenuation can now be directly or indirectly specified. For example, typical values can be provided as scalar constants, or clutter object-specific attributes can be derived from normal SQL cursor queries. Vegetation attenuation can be tied to measurements of canopy moisture content from remote sensing based on seasonal statistics, or for individual dates to match drive test data. Building attenuation can be driven by various known or inferred characteristics, such as from parcels databases.
The right-hand information panel has been extended to better support targeting of large numbers of buildings. This can be done directly by searching and filtering on building attributes in the HeavyRF application, such as building type or size. But it can also be combined with analyses in Immerse extending to multiple arbitrary tags. For example, a set of locations with high customer value and high potential for churn can be identified in Immerse and tagged with attributes searchable in HeavyRF.
Last but not least, the HeavyRF platform will soon be available on NVIDIA’s LaunchPad. This facilitates initial evaluation of the software by making it immediately available together with appropriate supporting GPU hardware.
HEAVY.AI continues to refine and extend the data connectors ecosystem. This release features general availability of data connectors for PostGreSQL, beta Immerse connectors for Snowflake and Redshift, and SQL support for Google BigQuery and Hive (beta). These managed data connections let you use HEAVY.AI as an acceleration platform, wherever your source data lives. Scheduling and automated caching ensure that from an end-user perspective, fast analytics are always running on the latest available data.
Immerse features four new chart types: Contour, Cross-section, Wind barb and Skew-t. While these are especially useful for atmospheric and geotechnical data visualization, Contour and Cross-section also have more general application.
Major improvements for time series analysis have been added. This includes time series comparison via window functions, and a large number of SQL window function additions and performance enhancements.
This release also includes two major architectural improvements:
The ability to perform cross-database queries in SQL, increasing flexibility across the board.
Render queries no longer block other GPU queries. In many use cases, renders can be significantly slower than other common queries. This should result in significant performance gains, particularly in map-heavy dashboards.
Chart animation through cross filter replay, allowing controlled playback of time-based data such as weather maps or GPS tracks.
You can now directly export your charts and dashboards as image files.
New control panel enables administrators to view the configuration of the system and easily access logs and system tables.
HeavyConnect now provides graphical Heavy Immerse support for Redshift, Snowflake, and PostGIS connections.
For CPU-only systems, mapping capabilities are improved with the introduction of multilayer CPU-rendered geo.
Numerous improvements to core SQL and geoSQL capabilities.
Support for string to numeric, timestamp, date, and time types with the new TRY_CAST operator.
Explicit and implicit cast support for numeric, timestamp, date, and time types.
Advanced string functions facilitate extraction of data from JSON and externally encoded string formats.
Improvements to COUNT DISTINCT reduces memory requirements considerably in cases with very large cardinalities or highly skewed data distributions.
Added MULTIPOINT and MULTILINESTRING geo types.
Convex and concave hull operators, allowing generation of polygons from points and multipoints. For example, you could generate polygons from clusters of GPS points.
Syntax and performance optimizations across all geometry types, table orderings, and commonly nested functions.
Significant functionality extension of window functions; define windows directly in temporal terms, which is particularly important in time series with missing observations. Window frame support allows improved control at the edges of windows.
Two new functions now support direct loading of LiDAR data: tf_point_cloud_metadata
quickly searches tile metadata and helps you find data to import, and tf_load_point_cloud
does the actual import importing.
Network graph analytics functions have been added. These can work on networks alone, including non-geographic networks, or can find the least-cost path along a geographic network.
New spatial aggregation and smoothing functions. Aggregations work particularly well with LiDAR data--for example to pass through only the highest point within an area to create building or canopy height maps. Smoothing helps with noisy datasets and can reveal larger-scale patterns while minimizing visual distractions.
Release 6.1.0 features more granular administrative monitoring dashboards based on logs. These have been accessible in an open format on the server side, and now they are available in Immerse, by specific dashboards, users, or queries. Intermediate and advanced SQL support continues to mature, with INSERT, window functions, and UNION ALL.
This release contains a number of user interface polish items requested by customers. Cartography now supports polygons with colorful borders and transparent fills. Table presentation has been enhanced in various ways, from alignment to zebra striping. And dashboard saving reminders have been scaled back, based on customer feedback.
The extension framework now features an enhanced “custom source” dialog, as well as new SQL commands to see installed extensions and their parameters. We introduce three new extensions. The first, tf_compute_dwell_times, reduces GPS event stream data volumes considerably while keeping relevant information. The others compute feature similarity scores and are very general.
This release also includes initial public betas of our PostgreSQL Immerse connector, and SQL support for COPY FROM ODBC database connections, making it easier to connect to your enterprise data.
This release features large advances in data access, including intelligent linking to enterprise data (HeavyConnect) and support for raster geodata. SQL support includes high-performance string functions, as well as enhancements to window functions and table unions. Performance improvements are noticeable across the product, including fundamental advances in rendering, query compilation, and data transport. Our system administration tools have been expanded with a new Admin Portal, as well as additional system tables supporting detailed diagnostics. Major strides in extensibility include new charting options and a new extensions framework (beta).
Rebranded platform from OmniSci to HEAVY.AI, with OmniSciDB now HeavyDB, OmniSci Render now HeavyRender, and OmniSci Immerse now Heavy Immerse.
HeavyConnect allows the HEAVY.AI platform to work seamlessly as an accelerator for data in other data lakes and data warehouses. For Release 6.0, CSV and Parquet files on local file systems and in S3 buckets can be linked or imported. Other SQL databases are also supported via ODBC (beta).
HeavyConnect enables users to specify a data refresh schedule, which ensures access to up-to-date data.
Heavy Immerse now supports import of dozens of raster data formats, including geoTIFF, geoJPEG , and PNG. HeavySQL now supports most any vector GIS file format.
Support is included for multidimensional arrays common in the sciences, including Grib2, NetCDF, and hd5.
Immerse now supports linking or import of files on the server filesystem (local or mounted). This help prevent slow data transfers when client bandwidth is limited.
File globbing and filtering allow import of thousands of files at once.
New Gauge chart for easy visualization of key metrics relative to target thresholds.
New landing page and Help Center.
Enhanced mapping workflows with automated column picking.
Support for a wide range of performant string operations using a new string dictionary translation framework, as well as the ability to on-the-fly dictionary encode none-encoded strings with a new ENCODE_TEXT operator.
Support for UNION ALL is now enabled by default, with significant performance improvements from the previous release (where it was beta flagged).
Significant functionality and performance improvements for window functions, including the ability to support expressions in PARTITION and ORDER clauses.
Parallel compilation of queries and a new multi-executor shared code cache provide up to 20% throughput/concurrency gains for interactive usage scenarios.
10X+ performance improvements in many cases for initial join queries via optimized Join Hash Table framework.
New result set recycler allows for expensive query sub-steps to be cached via the SQL hint /*+ keep_result */, which can significantly increase performance when a subquery is used across multiple queries.
Arrow execution endpoints now leverage the parallel execution framework, and Arrow performance has been significantly improved when high-cardinality dictionary-encoded text columns are returned
Introduces a novel polygon rendering algorithm that does not require pre-triangulated or pre-grouped polygons and can render dynamically generated geometry on the fly (via ST_Buffer). The new algorithm is comparable to its predecessor in terms of both performance and memory and enables optimizations and enhancements in future releases.
New binary transport protocol to Heavy Immerse that significantly increases performance and interactivity for large result sets
A new Admin Portal provides information on system resources usage and users.
System table support under a new information_schema
database, containing 10 new system tables providing system statistics and memory and storage utilization.
New system and user-defined UDF framework (beta), comprising both row (scalar) and table (UDTF) functions, including the ability to define fast UDFs via Numba Python using the RBC framework, which are then inlined into the HeavyDB compiled query code for performant CPU and GPU execution.
System-provided table functions include generate_series for easy numeric series generation, tf_geo_rasterize_slope for fast geospatial binning and slope/aspect computation over elevation data, and others, with more capabilities planned for future releases.
Leveraging the new table function framework, a new HeavyRF module (licensed separate) includes tf_rf_prop and tf_rf_prop_max_signal table functions for fast radio frequency signal propagation analysis and visualization.
New Iframe chart type in Heavy Immerse to allow easier addition of custom chart types. (BETA)
Row-level security (RLS) can be used by an administrator to apply security filtering to queries run as a user or with a role.
Support for import from dozens of image and raster file types, such as jpeg, png, geotiff, and ESRI grid, including remote files.
Significantly more performant, parallelized window functions, executing up to 10X faster than in Release 5.9.
Automatic use of columnar output (instead of the default row-wise output) for large projections, reducing query times by 5-10X in some cases.
Support for full set of ST_TRANSFORM SRIDs supported by geos/proj4 library.
Support for numerous vector GIS files (100+ formats supported by current GDAL release).
Support for multidimensional array import from formats common in science and meteorology.
Improved Table chart export to access all data represented by a Table chart.
Introduced dashboard-level named custom SQL.
Significant speedup for POINT and fixed-length array imports and CTAS/ITAS, generally 5-20X faster.
The PNG encoding step of a render request is no longer a blocking step, providing improvement to render concurrency.
Adds support to hide legacy chart types from add/edit chart menu in preparation for future deprecation (defaults to off).
BETA - Adds custom expressions to table columns, allowing for reusable custom dimensions and measures within a single dashboard (defaults to off).
BETA - Adds Crosslink feature with Crosslink Panel UI, allowing crossfilters to fire across different data sources within the same dashboard (defaults to off).
BETA - Adds Custom SQL Source support and Custom SQL Source Manager, allowing the creation of a data source as a SQL statement (defaults to off)
Parallel execution framework is on by default. Running with multiple executors allows parts of query evaluation, such as code generation and intermediate reductions, to be executed concurrently. Currently available for single-node deployments.
Spatial joins between geospatial point types using the ST_Distance operator are accelerated using the overlaps hash join framework, with speedups up to 100x compared to Release 5.7.1.
Significant performance gains for many query patterns through optimization of query code generation, particularly benefitting CPU queries.
Window functions can now be executed without a partition clause being specified (to signify a partition encompassing all rows in the table).
Window functions can now execute over tables with multiple fragments and/or shards.
Native support for ST_Transform between all UTM Zones and EPSG:4326 (Lon/Lat) and EPSG:900913 (Web Mercator).
ST_Equals support for geospatial columns.
Support for the ANSI SQL WIDTH_BUCKET operator for easier and more performant numeric binning, now also used in Immerse for all numeric histogram visualizations
The Vulkan backend renderer is now enabled by default. The legacy OpenGL renderer is still available as a fallback if there are blocking issues with Vulkan. You can disable the Vulkan renderer using the renderer-use-vulkan-driver=false
configuration flag.
Vulkan provides improved performance, memory efficiency, and concurrency.
You are likely to see some performance and memory footprint improvements with Vulkan in Release 5.8, most significantly in multi-GPU systems.
Support for file path regex filter and sort order when executing the COPY FROM command.
New ALTER SYSTEM CLEAR commands that enable clearing CPU or GPU memory from Immerse SQL Editor or any other SQL client.
Extensive enhancements to Immerse support for parameters. Parameters can now be used in chart column selectors, chart filters, chart titles, global filters, and dashboard titles. Dashboards can have parameter widgets embedded on them, side-by-side with charts. Parameter values are visible in chart axes/labels, legends, and tooltips, and you can toggle parameter visibility.
In Immerse Pointmap charts, you can specify which color-by attribute always render on top, which is useful for highlight anomalies in data.
Significantly faster and more accurate "lasso" tool filters geospatial data on Immerse Pointmap charts, leveraging native geospatial intersection operations.
Immerse 3D Pointmap chart and HTML support in text charts are available as a beta feature.
Airplane symbol shape has been added as a built-in mark type for the Vega rendering API.
Vega symbol and multi-GPU polygon renders have been made significantly faster.
User-interrupt of query kernels is now on by default. Queries can be interrupted using Ctrl + C in omnisql
, or by calling the interrupt API.
Parallel executors is in public beta (set with --num-executors
flag).
Support for APPROX_QUANTILE aggregate.
Support for default column values when creating a table and across all append endpoints, including COPY TO
, INSERT INTO TABLE SELECT
, INSERT
, and binary load APIs.
Faster and more robust ability to return result sets in Apache Arrow format when queried from a remote client (i.e. non-IPC).
More performant and robust high-cardinality group-by queries.
ODBC driver now supports Geospatial data types.
Custom SQL dimensions, measures, and filters can now be parameterized in Immerse, enabling more flexible and powerful scenario analysis, projections, and comparison use cases.
New angle measure added to Pointmap and Scatter charts, allowing orientation data to be visualized with wedge and arrow icons.
Custom SQL modal with validation and column name display now enabled across all charts in Immerse.
Significantly faster point-in-polygon joins through a new range join hash framework.
Approximate Median function support.
INSERT and INSERT FROM SELECT now support specification of a subset of columns.
Automatic metadata updates and vacuuming for optimizing space usage.
Significantly improved OmniSciDB startup time, as well as a number of significant load and performance improvements.
Improvements to line and polygon stroke rendering and point/symbol rendering.
Ability to set annotations on New Combo charts for different dimension/measure combinations.
New ‘Arrow-over-the-wire’ capability to deliver result sets in Apache Arrow format, with ~3x performance improvement over Thrift-based result set serialization.
Support for concurrent SELECT and UPDATE/DELETE queries for single-node installations
Initial OmniSci Render support for CPU-only query execution ("Query on CPU, render on GPU"), allowing for a wider set of deployment infrastructure choices.
Cap metadata stored on previous states of a table by using MAX_ROLLBACK_EPOCHS, improving performance for streaming and small batch load use cases and modulating table size on disk
Added initial compilation support for NVIDIA Ampere GPUs.
Improved performance for UPDATE and DELETE queries.
Improved the performance of filtered group-by queries on large-cardinality string columns.
Added SQL function SAMPLE_RATIO, which takes a proportion between 0 and 1 as an input argument and filters rows to obtain a sampling of a dataset.
Added support for exporting geo data in GeoJSON format.
Dashboard filter functionality is expanded, and filters can be saved as views.
You can perform bulk actions on the dashboard list.
New UI Setting panel in Immerse for customizing charts.
Tabbed dashboards.
SQL Editor now handles Vega JSON requests.
New Combo chart type in Immerse provides increased configurability and flexibility.
Immerse chart-specific filters and quick filters add increased flexibility and speed.
Updated Immerse Filter panel provides a Simple mode and Advanced mode for viewing and creating filters.
On multilayer charts, layer visibility can be set by zoom level.
Different map charts can be synced together for pan and zoom actions, regardless of data source.
Array support for the Array type over JDBC.
SELECT DISTINCT in UNION ALL is supported. (UNION ALL is prerelease and must be explicitly enabled.
Support for joins on DECIMAL types.
Performance improvements on CUDA GPUs, particularly Volta and Turing.
NULL support for geospatial types, including in ALTER TABLE ADD COLUMN.
SQL SHOW commands: SHOW TABLES, SHOW DATABASES, SHOW CREATE TABLE, and SHOW USER SESSIONS.
Ability to perform updates and deletes on temporary tables.
Updates to JDBC driver, including escape syntax handling for the fn keyword and added support to get table metadata.
Notable performance improvements, particularly for join queries, projection queries with order by and/or limit, queries with scalar subqueries, and multicolumn group-by queries.
Query interrupt capability improved to allow canceling long-running queries, also supports JDBC now.
Completely overhauled SQL Editor, including query formatting, snippets, history and more.
Database switching from within Immerse, as well as dashboard URLs that contain the database name.
Over 50% reduction in load times for the dashboards list initial load and search.
Cohort builder now supports count (# records) in aggregate filter.
Improved error handling and more meaningful error messages.
Custom logos can now be configured separately for light and dark themes.
Logos can be configured to deep-link to a specific URL.
Added support for UPDATE via JOIN with a subquery in the WHERE clause.
Initial support for TEMPORARY (that is, non-persistent) tables.
Improved performance for multi-column GROUP BY queries, as well as single column GROUP BY queries with high cardinality. Performance improvement varies depending on data volume and available hardware, but most use cases can expect a 1.5 to 2x performance increase over OmniSciDB 5.0.
Improved support for EXISTS and NOT EXISTS subqueries.
Added support for LINESTRING, POLYGON, and MULTIPOLYGON in user defined functions.
Immerse log-ins are fully sessionized and persist across page refreshes.
Pie chart now supports "All Others" and percentage labels.
Cohorts can now be built with aggregation-based filters.
New filter sets can be created through duplicating existing filter sets.
Dashboard URLs now link to individual filter sets.
The new filter panel in Immerse enables the ability to toggle filters on and off, and introduces Filter Sets to provide quick access to different sets of filters in one dashboard.
Immerse now supports using global and cross-filters to interactively build cohorts of interest, and the ability to apply a cohort as a dashboard filter, either within the existing filter set or in a new filter set.
Data Catalog, located within Data Import, is a repository of datasets that users can use to enhance existing analyses.
To see these new features in action, please watch this video from Converge 2019, where Rachel Wang demonstrates how you can use them.
Added support for binary dump and restore of database tables.
Added support for compile-time registered user-defined functions in C++, and experimental support for runtime user-defined SQL functions and table functions in Python via the Remote Backend Compiler.
Support for some forms of correlated subqueries.
Support for update via subquery, to allow for updating a table based on calculations performed on another table.
Multistep queries that generate large, intermediate result sets now execute up to 2.5x faster by leveraging new JIT code generator for reductions and optimized columnarization of intermediate query results.
Frontend-rendered choropleths now support the selection of base map layers.
This sitemap link is for the benefit of the search crawler.