JDBC Support for Database Sharding (2024)

Modern web applications face new scalability challenges with huge volumes of data. A commonly accepted solution to this problem is sharding. Sharding is a data tier architecture, where data is horizontally partitioned across independent databases. Each database in such a configuration is called a shard. All shards together make up a single logical database, which is referred to as a sharded database (SDB). Sharding is a shared-nothing database architecture because shards do not share physical resources such as CPU, memory, or storage devices.

Sharding uses Global Data Services (GDS), where GDS routes a client request to an appropriate database based on parameters such as availability, load, network latency, and replication lag. A GDS pool is a set of replicated databases that offer the same global service. The databases in a GDS pool can be located in multiple data centers across different regions. A sharded GDS pool contains all shards of a sharded database and their replicas, and appears as a single sharded database to database clients.

Starting from Oracle Database 12c Release 2 (12.2.0.1), Oracle JDBC supports database sharding. The JDBC driver recognizes the specified sharding key and super sharding key and connects to the relevant shard that contains the data. Once the connection is established to a shard, then any database operations, such as DMLs, SQL queries and so on, are supported and executed in the usual way. The following section describes the sharding terminologies used in this guide:

See Also:

Oracle Database Administrator’s Guide

Sharding, Shard, and Sharded Database

Sharding is a data tier architecture where data is horizontally partitioned across independent databases. Each database in such configuration is called a shard. All shards together make up a single logical database which is referred to as a sharded database (SDB).

Sharding Key, Composite Sharding Key, and Super Sharding Key

A sharding key is a partitioning key used in single-level sharding by range, list, or consistent hash. All sharding keys together are referred to as the composite sharding keys. A super-sharding key is the partitioning key used in composite sharding for the top-level sharding by range or list. Both the sharding key and the super sharding key can contain one or more columns that determine the shard where each row is stored. A sharding key can be of type VARCHAR2, CHAR, DATE, NUMBER, TIMESTAMP and so on.

For JDBC users, it is recommended that sharding keys and super sharding keys must be passed while obtaining connections from the database. However, Sharding Keys can be provided in the connection string as a separate attribute under CONNECT_DATA. Passing sharding key in the connection string restricts the connections only to one shard. So, it is not recommended to use this approach. Following code snippet shows how you can provide Sharding Keys as a separate attribute under CONNECT_DATA in the connection string:

(DESCRIPTION=(…)(CONNECT_DATA=(SERVICE_NAME=ORCL (SHARDING_KEY=…) (SUPER_SHARDING_KEY=...)))

Note:

You must provide the sharding key compliant to the NLS formatting that is specified in the database.

Multi Shard Queries

Multi Shard Queries enable routing and processing of queries and transactions that access data stored on multiple shards. Multi Shard Queries are executed without a sharding key. Multi Shard Operations are used for simple aggregation of data and reporting across shards.

Shard Catalog

Shard Catalog is a special database that is used for storing sharded database and supporting multi shard queries. It also helps in centralized management of a sharded database.

Shard Director

A shard director is a specific implementation of a global service manager (GSM) that acts as a regional listener for clients that connect to an SDB and maintains a current topology map of the SDB. Based on the sharding key passed during a connection request, it routes the connections to the appropriate shard.

Shard Topology

Shard Topology is the sharding key range mappings stored in a particular shard. Universal Connection Pool (UCP) can cache shard topology, which enables it to bypass shard director while establishing connections to shards. So, applications that you built using UCP get fast path for shards.

See Also:

Oracle Universal Connection Pool Developer’s Guide

Chunk

A chunk is a single partition from each table of a table family. It is a unit of data migration between shards.

Chunk Split

Chunk Split is a process that is required when chunks become too big or only part of a chunk needs to be migrated to another shard.

Chunk Migration

Chunk migration is the process of moving a chunk from one shard to another, when data or workload skew occurs without any change in the number of shards. It is initiated by DBA to eliminate hot spots.

Resharding

Resharding is the process of redistributing data between shards triggered by a change in the number of shards. Chunks are moved between shards for even distribution of chunks across shards. However, content of chunks does not change, that is, no rehashing takes place during Resharding.

JDBC Support for Database Sharding (2024)

FAQs

Which database supports sharding? ›

Cassandra, HBase, HDFS , MongoDB and Redis are databases that support sharding. Sqlite, Memcached, Zookeeper, MySQL and PostgreSQL are databases that don't natively support sharding at the database layer. For databases that don't offer built-in support, sharding logic has to reside in the application.

What is the sharding key in JDBC? ›

A sharding key can be of type VARCHAR2, CHAR, DATE, NUMBER, TIMESTAMP and so on. Note: You must provide the sharding key compliant to the NLS formatting that is specified in the database. Multi Shard Queries enable routing and processing of queries and transactions that access data stored on multiple shards.

Does MySQL support sharding? ›

MySQL NDB Cluster automatically shards (partitions) tables across nodes, enabling databases to scale horizontally on low cost, commodity hardware to serve read and write-intensive workloads, accessed both from SQL and directly via NoSQL APIs.

Does PostgreSQL support sharding? ›

In fact, PostgreSQL has implemented sharding on top of partitioning by allowing any given partition of a partitioned table to be hosted by a remote server. The basis for this is in PostgreSQL's Foreign Data Wrapper (FDW) support, which has been a part of the core of PostgreSQL for a long time.

What is the difference between sharding JDBC and sharding proxy? ›

Sharding-JDBC adopts decentralized architecture, applicable to high-performance light-weight OLTP application developed with Java; Sharding-Proxy provides static entry and all languages support, applicable for OLAP application and the sharding databases management and operation situation.

Is sharding only for SQL? ›

Sharding is a core feature of NoSQL databases, designed from the ground up to support horizontal scalability and distributed data storage.

Does MongoDB use sharding? ›

Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations. Database systems with large data sets or high throughput applications can challenge the capacity of a single server.

What are the best practices for sharding in MySQL? ›

Best Practices for Sharding in MySQL

Choose the right sharding key: The sharding key determines how data is distributed across shards. It should be carefully chosen to evenly distribute the data and avoid hotspots. Common sharding keys include user IDs, timestamps, or geographical locations.

Can NoSQL databases be sharded? ›

Sharding can offer several benefits for NoSQL databases, such as scalability, performance, and availability. It can help scale the database horizontally by adding more servers or nodes as the data grows, thus reducing load and bottlenecks on a single server and increasing throughput and storage capacity.

Is PostgreSQL obsolete? ›

According to the official PostgreSQL versioning policy page, the final PostgreSQL 11 release is expected by November 9, 2023. Since no new releases are planned by that date, PostgreSQL 11 has effectively reached its End of Life.

Does neo4j support sharding? ›

An existing database can be sharded with the help of the neo4j-admin database copy command. For an example, see Sharding data with the copy command.

What is the difference between sharding and partitioning? ›

Sharding and partitioning are techniques to divide and scale large databases. Sharding distributes data across multiple servers, while partitioning splits tables within one server.

Does Oracle support sharding? ›

Oracle Sharding supports system-managed, user defined, or composite sharding methods. System-managed sharding does not require you to map data to shards. The data is automatically distributed across shards using partitioning by consistent hash.

Does Cassandra support sharding? ›

Both Cassandra and MongoDB allow sharding—a technique to horizontally partition data across multiple nodes in a cluster.

Top Articles
Latest Posts
Article information

Author: Manual Maggio

Last Updated:

Views: 6468

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Manual Maggio

Birthday: 1998-01-20

Address: 359 Kelvin Stream, Lake Eldonview, MT 33517-1242

Phone: +577037762465

Job: Product Hospitality Supervisor

Hobby: Gardening, Web surfing, Video gaming, Amateur radio, Flag Football, Reading, Table tennis

Introduction: My name is Manual Maggio, I am a thankful, tender, adventurous, delightful, fantastic, proud, graceful person who loves writing and wants to share my knowledge and understanding with you.