Skip to content

Connecting to Spark SQL

Operation Steps

Please follow the steps below to connect to a Spark SQL data source.

  1. On the data connection page, click "New Data Connection" in the upper right corner.

  2. In the data source types, select the Spark SQL data source.

  3. Fill in the required parameters for the data source connection as prompted.

    Connection Configuration Information Description

    FieldDescription
    NameThe name of the connection. Required and unique within the user.
    Host AddressThe address of the database. If the URL field is filled, the URL takes precedence.
    PortThe port of the database. If the URL field is filled, the URL takes precedence.
    UsernameThe username for the database.
    PasswordThe password for the database.
    DatabaseThe name of the database.
    Max ConnectionsThe maximum number of connections in the connection pool.
    EncodingThe encoding setting for the database connection.
    Prefer using database comment as dataset titleChoose whether to display the table name or the table comment. When enabled, the title is shown; when disabled, the table comment is shown.
    URLThe JDBC URL of the database.
    Hierarchical loading of schema and tablesDisabled by default. When enabled, schemas and tables are loaded hierarchically. Only schemas are loaded during connection, and you need to click the schema to load the tables under it.
    Query Timeout (seconds)Default is 600. For large data volumes, you can increase the timeout as needed.
    Show only tables under the specified database/schemaWhen this option is selected and the database field is not empty, only tables under the specified database are shown.
    1. After filling in the parameters, click the Validate button to get the validation result (this checks the connectivity between HENGSHI SENSE and the configured data connection; you cannot add the connection if validation fails).

    2. After validation passes, click Execute Preset Code to pop up the preset code for this data source, then click the execute button.

    3. Click the Add button to add the Spark SQL connection.

Please Note

  1. Parameters marked with * are required; others are optional.
  2. When connecting to a data source, you must execute the preset code. Failure to do so will result in certain functions being unavailable during data analysis. In addition, when upgrading from a version prior to 4.4 to 4.4, you need to execute the preset code for existing data connections in the system.

Supported Spark SQL Versions

2.3.0 and above

Data Connection Preview Support

Supports all tables that can be listed by show tables.

SQL Dataset Support for SQL

Only SELECT statements are supported, along with all SELECT-related features supported by the connected Spark instance. Users need to ensure that the syntax complies with the Spark SQL standard.

Supported Authentication Methods for Connecting to Spark Thrift Server

Supports username and password authentication; SSL is not supported.

Unsupported Field Types

The following data types in Spark cannot be processed correctly:

  • BINARY
  • arrays
  • maps
  • structs
  • union

User Manual for Hengshi Analysis Platform