Welcome to HEAVY.AI Documentation
Last updated
Last updated
Use of HEAVY.AI is subject to the terms of the HEAVY.AI End User License Agreement (EULA).
Learn how to use Immerse to gain new insights to your data with fast, responsive graphics and SQL queries.
Learn how to Install and configure your HEAVY.AI instance, then load data for analysis.
Learn how to extend HEAVY.AI with an integrated data science foundation and custom charts and interfaces. Contribute to the HEAVY.AI Core Open Source project.
For more complete release information, see the Release Notes.
HEAVY.AI continues to refine and extend the data connectors ecosystem. This release features general availability of data connectors for PostGreSQL, beta Immerse connectors for Snowflake and Redshift, and SQL support for Google BigQuery and Hive (beta). These managed data connections let you use HEAVY.AI as an acceleration platform, wherever your source data lives. Scheduling and automated caching ensure that from an end-user perspective, fast analytics are always running on the latest available data.
Immerse features four new chart types: Contour, Cross-section, Wind barb and Skew-t. While these are especially useful for atmospheric and geotechnical data visualization, Contour and Cross-section also have more general application.
Major improvements for time series analysis have been added. This includes time series comparison via window functions, and a large number of SQL window function additions and performance enhancements.
This release also includes two major architectural improvements:
The ability to perform cross-database queries in SQL, increasing flexibility across the board.
Render queries no longer block other GPU queries. In many use cases, renders can be significantly slower than other common queries. This should result in significant performance gains, particularly in map-heavy dashboards.
Chart animation through cross filter replay, allowing controlled playback of time-based data such as weather maps or GPS tracks.
You can now directly export your charts and dashboards as image files.
New control panel enables administrators to view the configuration of the system and easily access logs and system tables.
HeavyConnect now provides graphical Heavy Immerse support for Redshift, Snowflake, and PostGIS connections.
For CPU-only systems, mapping capabilities are improved with the introduction of multilayer CPU-rendered geo.
Numerous improvements to core SQL and geoSQL capabilities.
Support for string to numeric, timestamp, date, and time types with the new TRY_CAST operator.
Explicit and implicit cast support for numeric, timestamp, date, and time types.
Advanced string functions facilitate extraction of data from JSON and externally encoded string formats.
Improvements to COUNT DISTINCT reduces memory requirements considerably in cases with very large cardinalities or highly skewed data distributions.
Added MULTIPOINT and MULTILINESTRING geo types.
Convex and concave hull operators, allowing generation of polygons from points and multipoints. For example, you could generate polygons from clusters of GPS points.
Syntax and performance optimizations across all geometry types, table orderings, and commonly nested functions.
Significant functionality extension of window functions; define windows directly in temporal terms, which is particularly important in time series with missing observations. Window frame support allows improved control at the edges of windows.
Two new functions now support direct loading of LiDAR data: tf_point_cloud_metadata
quickly searches tile metadata and helps you find data to import, and tf_load_point_cloud
does the actual import importing.
Network graph analytics functions have been added. These can work on networks alone, including non-geographic networks, or can find the least-cost path along a geographic network.
New spatial aggregation and smoothing functions. Aggregations work particularly well with LiDAR data--for example to pass through only the highest point within an area to create building or canopy height maps. Smoothing helps with noisy datasets and can reveal larger-scale patterns while minimizing visual distractions.
Release 6.1.0 features more granular administrative monitoring dashboards based on logs. These have been accessible in an open format on the server side, and now they are available in Immerse, by specific dashboards, users, or queries. Intermediate and advanced SQL support continues to mature, with INSERT, window functions, and UNION ALL.
This release contains a number of user interface polish items requested by customers. Cartography now supports polygons with colorful borders and transparent fills. Table presentation has been enhanced in various ways, from alignment to zebra striping. And dashboard saving reminders have been scaled back, based on customer feedback.
The extension framework now features an enhanced “custom source” dialog, as well as new SQL commands to see installed extensions and their parameters. We introduce three new extensions. The first, tf_compute_dwell_times, reduces GPS event stream data volumes considerably while keeping relevant information. The others compute feature similarity scores and are very general.
This release also includes initial public betas of our PostgreSQL Immerse connector, and SQL support for COPY FROM ODBC database connections, making it easier to connect to your enterprise data.
This release features large advances in data access, including intelligent linking to enterprise data (HeavyConnect) and support for raster geodata. SQL support includes high-performance string functions, as well as enhancements to window functions and table unions. Performance improvements are noticeable across the product, including fundamental advances in rendering, query compilation, and data transport. Our system administration tools have been expanded with a new Admin Portal, as well as additional system tables supporting detailed diagnostics. Major strides in extensibility include new charting options and a new extensions framework (beta).
Rebranded platform from OmniSci to HEAVY.AI, with OmniSciDB now HeavyDB, OmniSci Render now HeavyRender, and OmniSci Immerse now Heavy Immerse.
HeavyConnect allows the HEAVY.AI platform to work seamlessly as an accelerator for data in other data lakes and data warehouses. For Release 6.0, CSV and Parquet files on local file systems and in S3 buckets can be linked or imported. Other SQL databases are also supported via ODBC (beta).
HeavyConnect enables users to specify a data refresh schedule, which ensures access to up-to-date data.
Heavy Immerse now supports import of dozens of raster data formats, including geoTIFF, geoJPEG , and PNG. HeavySQL now supports most any vector GIS file format.
Support is included for multidimensional arrays common in the sciences, including Grib2, NetCDF, and hd5.
Immerse now supports linking or import of files on the server filesystem (local or mounted). This help prevent slow data transfers when client bandwidth is limited.
File globbing and filtering allow import of thousands of files at once.
New Gauge chart for easy visualization of key metrics relative to target thresholds.
New landing page and Help Center.
Enhanced mapping workflows with automated column picking.
Support for a wide range of performant string operations using a new string dictionary translation framework, as well as the ability to on-the-fly dictionary encode none-encoded strings with a new ENCODE_TEXT operator.
Support for UNION ALL is now enabled by default, with significant performance improvements from the previous release (where it was beta flagged).
Significant functionality and performance improvements for window functions, including the ability to support expressions in PARTITION and ORDER clauses.
Parallel compilation of queries and a new multi-executor shared code cache provide up to 20% throughput/concurrency gains for interactive usage scenarios.
10X+ performance improvements in many cases for initial join queries via optimized Join Hash Table framework.
New result set recycler allows for expensive query sub-steps to be cached via the SQL hint /*+ keep_result */, which can significantly increase performance when a subquery is used across multiple queries.
Arrow execution endpoints now leverage the parallel execution framework, and Arrow performance has been significantly improved when high-cardinality dictionary-encoded text columns are returned
Introduces a novel polygon rendering algorithm that does not require pre-triangulated or pre-grouped polygons and can render dynamically generated geometry on the fly (via ST_Buffer). The new algorithm is comparable to its predecessor in terms of both performance and memory and enables optimizations and enhancements in future releases.
New binary transport protocol to Heavy Immerse that significantly increases performance and interactivity for large result sets
A new Admin Portal provides information on system resources usage and users.
System table support under a new information_schema
database, containing 10 new system tables providing system statistics and memory and storage utilization.
New system and user-defined UDF framework (beta), comprising both row (scalar) and table (UDTF) functions, including the ability to define fast UDFs via Numba Python using the RBC framework, which are then inlined into the HeavyDB compiled query code for performant CPU and GPU execution.
System-provided table functions include generate_series for easy numeric series generation, tf_geo_rasterize_slope for fast geospatial binning and slope/aspect computation over elevation data, and others, with more capabilities planned for future releases.
Leveraging the new table function framework, a new HeavyRF module (licensed separate) includes tf_rf_prop and tf_rf_prop_max_signal table functions for fast radio frequency signal propagation analysis and visualization.
New Iframe chart type in Heavy Immerse to allow easier addition of custom chart types. (BETA)
Row-level security (RLS) can be used by an administrator to apply security filtering to queries run as a user or with a role.
Support for import from dozens of image and raster file types, such as jpeg, png, geotiff, and ESRI grid, including remote files.
Significantly more performant, parallelized window functions, executing up to 10X faster than in Release 5.9.
Automatic use of columnar output (instead of the default row-wise output) for large projections, reducing query times by 5-10X in some cases.
Support for full set of ST_TRANSFORM SRIDs supported by geos/proj4 library.
Support for numerous vector GIS files (100+ formats supported by current GDAL release).
Support for multidimensional array import from formats common in science and meteorology.
Improved Table chart export to access all data represented by a Table chart.
Introduced dashboard-level named custom SQL.
Significant speedup for POINT and fixed-length array imports and CTAS/ITAS, generally 5-20X faster.
The PNG encoding step of a render request is no longer a blocking step, providing improvement to render concurrency.
Adds support to hide legacy chart types from add/edit chart menu in preparation for future deprecation (defaults to off).
BETA - Adds custom expressions to table columns, allowing for reusable custom dimensions and measures within a single dashboard (defaults to off).
BETA - Adds Crosslink feature with Crosslink Panel UI, allowing crossfilters to fire across different data sources within the same dashboard (defaults to off).
BETA - Adds Custom SQL Source support and Custom SQL Source Manager, allowing the creation of a data source as a SQL statement (defaults to off)
Parallel execution framework is on by default. Running with multiple executors allows parts of query evaluation, such as code generation and intermediate reductions, to be executed concurrently. Currently available for single-node deployments.
Spatial joins between geospatial point types using the ST_Distance operator are accelerated using the overlaps hash join framework, with speedups up to 100x compared to Release 5.7.1.
Significant performance gains for many query patterns through optimization of query code generation, particularly benefitting CPU queries.
Window functions can now be executed without a partition clause being specified (to signify a partition encompassing all rows in the table).
Window functions can now execute over tables with multiple fragments and/or shards.
Native support for ST_Transform between all UTM Zones and EPSG:4326 (Lon/Lat) and EPSG:900913 (Web Mercator).
ST_Equals support for geospatial columns.
Support for the ANSI SQL WIDTH_BUCKET operator for easier and more performant numeric binning, now also used in Immerse for all numeric histogram visualizations
The Vulkan backend renderer is now enabled by default. The legacy OpenGL renderer is still available as a fallback if there are blocking issues with Vulkan. You can disable the Vulkan renderer using the renderer-use-vulkan-driver=false
configuration flag.
Vulkan provides improved performance, memory efficiency, and concurrency.
You are likely to see some performance and memory footprint improvements with Vulkan in Release 5.8, most significantly in multi-GPU systems.
Support for file path regex filter and sort order when executing the COPY FROM command.
New ALTER SYSTEM CLEAR commands that enable clearing CPU or GPU memory from Immerse SQL Editor or any other SQL client.
Extensive enhancements to Immerse support for parameters. Parameters can now be used in chart column selectors, chart filters, chart titles, global filters, and dashboard titles. Dashboards can have parameter widgets embedded on them, side-by-side with charts. Parameter values are visible in chart axes/labels, legends, and tooltips, and you can toggle parameter visibility.
In Immerse Pointmap charts, you can specify which color-by attribute always render on top, which is useful for highlight anomalies in data.
Significantly faster and more accurate "lasso" tool filters geospatial data on Immerse Pointmap charts, leveraging native geospatial intersection operations.
Immerse 3D Pointmap chart and HTML support in text charts are available as a beta feature.
Airplane symbol shape has been added as a built-in mark type for the Vega rendering API.
Vega symbol and multi-GPU polygon renders have been made significantly faster.
User-interrupt of query kernels is now on by default. Queries can be interrupted using Ctrl + C in omnisql
, or by calling the interrupt API.
Parallel executors is in public beta (set with --num-executors
flag).
Support for APPROX_QUANTILE aggregate.
Support for default column values when creating a table and across all append endpoints, including COPY TO
, INSERT INTO TABLE SELECT
, INSERT
, and binary load APIs.
Faster and more robust ability to return result sets in Apache Arrow format when queried from a remote client (i.e. non-IPC).
More performant and robust high-cardinality group-by queries.
ODBC driver now supports Geospatial data types.
Custom SQL dimensions, measures, and filters can now be parameterized in Immerse, enabling more flexible and powerful scenario analysis, projections, and comparison use cases.
New angle measure added to Pointmap and Scatter charts, allowing orientation data to be visualized with wedge and arrow icons.
Custom SQL modal with validation and column name display now enabled across all charts in Immerse.
Significantly faster point-in-polygon joins through a new range join hash framework.
Approximate Median function support.
INSERT and INSERT FROM SELECT now support specification of a subset of columns.
Automatic metadata updates and vacuuming for optimizing space usage.
Significantly improved OmniSciDB startup time, as well as a number of significant load and performance improvements.
Improvements to line and polygon stroke rendering and point/symbol rendering.
Ability to set annotations on New Combo charts for different dimension/measure combinations.
New ‘Arrow-over-the-wire’ capability to deliver result sets in Apache Arrow format, with ~3x performance improvement over Thrift-based result set serialization.
Support for concurrent SELECT and UPDATE/DELETE queries for single-node installations
Initial OmniSci Render support for CPU-only query execution ("Query on CPU, render on GPU"), allowing for a wider set of deployment infrastructure choices.
Cap metadata stored on previous states of a table by using MAX_ROLLBACK_EPOCHS, improving performance for streaming and small batch load use cases and modulating table size on disk
Added initial compilation support for NVIDIA Ampere GPUs.
Improved performance for UPDATE and DELETE queries.
Improved the performance of filtered group-by queries on large-cardinality string columns.
Added SQL function SAMPLE_RATIO, which takes a proportion between 0 and 1 as an input argument and filters rows to obtain a sampling of a dataset.
Added support for exporting geo data in GeoJSON format.
Dashboard filter functionality is expanded, and filters can be saved as views.
You can perform bulk actions on the dashboard list.
New UI Setting panel in Immerse for customizing charts.
Tabbed dashboards.
SQL Editor now handles Vega JSON requests.
New Combo chart type in Immerse provides increased configurability and flexibility.
Immerse chart-specific filters and quick filters add increased flexibility and speed.
Updated Immerse Filter panel provides a Simple mode and Advanced mode for viewing and creating filters.
On multilayer charts, layer visibility can be set by zoom level.
Different map charts can be synced together for pan and zoom actions, regardless of data source.
Array support for the Array type over JDBC.
SELECT DISTINCT in UNION ALL is supported. (UNION ALL is prerelease and must be explicitly enabled.
Support for joins on DECIMAL types.
Performance improvements on CUDA GPUs, particularly Volta and Turing.
NULL support for geospatial types, including in ALTER TABLE ADD COLUMN.
SQL SHOW commands: SHOW TABLES, SHOW DATABASES, SHOW CREATE TABLE, and SHOW USER SESSIONS.
Ability to perform updates and deletes on temporary tables.
Updates to JDBC driver, including escape syntax handling for the fn keyword and added support to get table metadata.
Notable performance improvements, particularly for join queries, projection queries with order by and/or limit, queries with scalar subqueries, and multicolumn group-by queries.
Query interrupt capability improved to allow canceling long-running queries, also supports JDBC now.
Completely overhauled SQL Editor, including query formatting, snippets, history and more.
Database switching from within Immerse, as well as dashboard URLs that contain the database name.
Over 50% reduction in load times for the dashboards list initial load and search.
Cohort builder now supports count (# records) in aggregate filter.
Improved error handling and more meaningful error messages.
Custom logos can now be configured separately for light and dark themes.
Logos can be configured to deep-link to a specific URL.
Added support for UPDATE via JOIN with a subquery in the WHERE clause.
Initial support for TEMPORARY (that is, non-persistent) tables.
Improved performance for multi-column GROUP BY queries, as well as single column GROUP BY queries with high cardinality. Performance improvement varies depending on data volume and available hardware, but most use cases can expect a 1.5 to 2x performance increase over OmniSciDB 5.0.
Improved support for EXISTS and NOT EXISTS subqueries.
Added support for LINESTRING, POLYGON, and MULTIPOLYGON in user defined functions.
Immerse log-ins are fully sessionized and persist across page refreshes.
Pie chart now supports "All Others" and percentage labels.
Cohorts can now be built with aggregation-based filters.
New filter sets can be created through duplicating existing filter sets.
Dashboard URLs now link to individual filter sets.
The new filter panel in Immerse enables the ability to toggle filters on and off, and introduces Filter Sets to provide quick access to different sets of filters in one dashboard.
Immerse now supports using global and cross-filters to interactively build cohorts of interest, and the ability to apply a cohort as a dashboard filter, either within the existing filter set or in a new filter set.
Data Catalog, located within Data Import, is a repository of datasets that users can use to enhance existing analyses.
To see these new features in action, please watch this video from Converge 2019, where Rachel Wang demonstrates how you can use them.
Added support for binary dump and restore of database tables.
Added support for compile-time registered user-defined functions in C++, and experimental support for runtime user-defined SQL functions and table functions in Python via the Remote Backend Compiler.
Support for some forms of correlated subqueries.
Support for update via subquery, to allow for updating a table based on calculations performed on another table.
Multistep queries that generate large, intermediate result sets now execute up to 2.5x faster by leveraging new JIT code generator for reductions and optimized columnarization of intermediate query results.
Frontend-rendered choropleths now support the selection of base map layers.
This sitemap link is for the benefit of the search crawler.