Compute.AI SQL Support
ComputeAI provides AWS Glue catalog support for Hive and Iceberg Tables.
USE <Glue Catalog Name>;
You can query parquet files directly with SQL.
select * from parquet.`file_path`
Parameters:
Example: select * from parquet.`s3a://jitsuaidata/datasets/tpch/sf10/parquet/partitioned/customer/` limit 5;
ComputeAI provides the ability to query Iceberg Tables backed by AWS Glue catalog.
USE <Iceberg Glue Catalog Name>;
SELECT * FROM <Iceberg Catalog Name>.<Iceberg DB Name>.<Table Name>;
Create a new table containing the result of a select query. The optional IF NOT EXISTS clause causes the error to be suppressed if the table already exists.
CREATE TABLE [ IF NOT EXISTS ] table_name [(column_alias, ... )]
AS query
Example: create table customer as select * from parquet.`s3a://jitsuaidata/datasets/tpch/sf10/parquet/partitioned/customer`;
Lists the databases that match an optionally supplied regular expression pattern.
SHOW { DATABASES | SCHEMAS } [ LIKE regex_pattern ]
USE statement is used to set the current database. After the current database is set, the unqualified database artifacts such as tables, functions and views that are referenced by SQLs are resolved from the current database.
USE database_name
The SHOW TABLES statement returns all the tables for an optionally specified database.
SHOW TABLES [{ FROM | IN } database_name ][ LIKE regex_pattern ]
DESCRIBE TABLE statement returns the basic metadata of a table.
{ DESC | DESCRIBE } [ TABLE ] [ format ] table_identifier [ partition_spec ] [ col_name ]
DROP TABLE deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL table.
DROP TABLE [ IF EXISTS ] table_identifier
ComputeAI supports a SELECT statement and conforms to the ANSI SQL standard. Queries are used to retrieve result sets from one or more tables.
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ { LIMIT [ count | ALL ] } ]
where from_item is one of:
and join_type is one of:
The WITH clause defines named relations for use within a query. It allows flattening nested queries or simplifying subqueries.
The GROUP BY clause aggregates the output of a SELECT statement into groups of rows containing matching values. A simple GROUP BY clause may contain any expression composed of input columns or it may be an ordinal number selecting an output column by position.
When a GROUP BY clause is used in a SELECT statement all output expressions must be either aggregate functions or columns present in the GROUP BY clause.
The HAVING clause is used in conjunction with aggregate functions and the GROUP BY clause to control which groups are selected. A HAVING clause eliminates groups that do not satisfy the given conditions. HAVING filters groups after groups and aggregates are computed.
The ORDER BY clause is used to sort a result set by one or more output expressions:
ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...]
The LIMIT clause restricts the number of rows in the result set. LIMIT ALL is the same as not having a LIMIT clause.
LIMIT { count | ALL }
Joins allow you to combine data from multiple relations. Join types supported:
The WHERE clause is used to limit the results of the FROM clause of a query or a subquery based on the specified condition.
WHERE boolean_expression
A subquery is an expression which is composed of a query. The subquery is correlated when it refers to columns outside of the subquery. Logically, the subquery will be evaluated for each row in the surrounding query. The referenced columns will thus be constant during any single evaluation of the subquery.
The EXISTS predicate determines if a subquery returns any rows.
The IN predicate determines if any values produced by the subquery are equal to the provided expression. The result of IN follows the standard rules for nulls. The subquery must produce exactly one column.
A scalar subquery is a non-correlated subquery that returns zero or one row. It is an error for the subquery to produce more than one row. The returned value is NULL if the subquery produces no rows.
ComputeAI supports all standard built-in functions for strings, date/time, sort, windows and aggregates.
Union allows you to combine two or more SELECT statements. Both union and union all are supported.
SELECT col_name FROM table UNION [ALL | DISTINCT] SELECT col_name_2 FROM table_2;
Window (also known as a window function or analytic function) is a type of function that performs a calculation across a set of rows that are related to the current row.
SELECT RANK() OVER (PARTITION BY col_name ORDER BY col_name_2) FROM table;
Supported Rank functions
Supported Aggregate functions
The EXPLAIN statement is used to provide logical plans for an input statement.
EXPLAIN [ EXTENDED | FORMATTED ] statement