Working with hierarchical structures is a classic case for relational databases. Examples include: directories, tree menus, organizational structures.
Background: Databases use a tabular model. There are several approaches for tree-like data, each with its advantages and disadvantages.
Problem: The standard parent_id model leads to slow recursive SELECTs. Real tasks (finding all descendants, paths, subtree counting) require optimization.
Solutions:
Example of code for Materialized Path (PostgreSQL):
CREATE TABLE categories ( id SERIAL PRIMARY KEY, name TEXT, path TEXT ); -- Example of storing path: '1/2/5/' (root/subcategory/current) SELECT * FROM categories WHERE path LIKE '1/2/%'; -- all descendants of 2
Example of code for Nested Sets:
CREATE TABLE nested_categories ( id SERIAL PRIMARY KEY, name TEXT, lft INT NOT NULL, rgt INT NOT NULL ); -- Child nodes SELECT * FROM nested_categories WHERE lft > 2 AND rgt < 15;
Key features:
Can we simply use a nested SELECT with parent_id to find all descendants to any depth?
This works only for shallow depths. For recursive searching, either a recursive CTE (WITH RECURSIVE) or more complex schemes are required, as simple JOINs lead to an enormous number of requests and poor performance.
Example:
WITH RECURSIVE cte AS ( SELECT id, parent_id, name FROM categories WHERE id = 1 UNION ALL SELECT c.id, c.parent_id, c.name FROM categories c INNER JOIN cte ON c.parent_id = cte.id ) SELECT * FROM cte;
Why does the molecular tree (Nested Sets) quickly extract a subtree, but slowly add/remove nodes?
When modifying the tree, the lft/rgt values must be changed for many rows (the change step — all values greater than the insert/delete). The approach is ideal for reading, but for frequent changes, consider other methods.
When to use a Closure Table, instead of parent_id or materialized path?
Closure Table works perfectly for multiple requests for descendants at any level and regular relationship counting. The downside is that it requires more space.
Example:
CREATE TABLE closure ( ancestor INT, descendant INT, depth INT );
In an online store, categories are implemented through parent_id. All nesting is done manually, searching for subcategories involves nested SELECTs.
Pros:
Cons:
Uses materialized path or closure table. All queries for nested categories are instantaneous, recalculation is done through batch scripts.
Pros:
Cons: