ProgrammingFullstack Developer

How to work with arrays and array-like structures in SQL for storing and analyzing multiple values in a single cell, and when is such an approach justified?

Pass interviews with Hintsage AI assistant

Answer.

Background

Classic SQL does not provide for storing multiple values in a single cell — the relational model requires normalization. However, in modern tasks, fields like "list of tags", "rating scale" are often encountered where it is convenient to operate with multiple values at the level of a single row. Some DBMS (PostgreSQL, Oracle) provide ARRAY data types or similar mechanisms.

Problem

Using arrays violates the principle of normalization, complicates many operations (filtering, updating, indexing), and also makes the code less portable between DBMS. But it can be convenient or unavoidable — for example, for caching or fast searching within small lists of values.

Solution

  • In PostgreSQL, array support is native. Example:
CREATE TABLE products ( id SERIAL PRIMARY KEY, tags TEXT[] ); -- Insertion: INSERT INTO products(tags) VALUES (ARRAY['eco','sale','hot']); -- Searching in the array: SELECT * FROM products WHERE 'eco' = ANY (tags);
  • In MySQL 5.x, there are no arrays; JSON or delimited strings and functions for parsing are often used instead.
  • In Oracle — collections, nested table/varray.
  • For optimal analytical tasks, it is better to normalize (create a related secondary table product_tags) and use JOIN, storing arrays only in special cases (performance or specific requirements).

Key features:

  • Convenient when an array is really needed and the DBMS supports it.
  • Issues with indexing and filtering with large arrays.
  • Not portable between DBMS, complicates maintenance.

Tricky Questions.

Can individual elements of an array be indexed?

In PostgreSQL — yes, via GIN/GIST indexes:

CREATE INDEX idx_tags ON products USING GIN (tags);

How to check for the presence of a value in an array in a string column through a delimiter faster?

SQL doesn’t inherently support this, they use pattern matching:

SELECT * FROM users WHERE ',admin,' like concat('%,',role,',%');

But this approach is unreliable and slow.

How many values can be stored in an array, and what limits it?

The limit depends on the DBMS — for example, in PostgreSQL, the limit is only on the row size (1–2 MB).

Common Mistakes and Anti-patterns

  • Storing arrays in a single cell for "simplicity" and complicating analysis
  • Incorrectly filtering values using LIKE without considering delimiters
  • Relying on uniqueness and indexing across string-arrays

Real-life example

Negative case

In an ecommerce project, product tags were decided to be stored as a comma-separated string in one column. This severely hindered fast product search by tag, causing filter errors, and tag duplication occurred due to parsing errors.

Pros:

  • "Simple" and quick to implement

Cons:

  • Very slow at scale, hard to maintain, impossible to guarantee value uniqueness

Positive case

In PostgreSQL for small, immutable sets (user roles), ARRAY and a GIN index were used. For larger ones — a separate roles table.

Pros:

  • Fast search through ARRAY via index
  • Maintains compatibility with the relational model where needed

Cons:

  • Not portable, requires knowledge of advanced DBMS features