Fractals in BigQuery
Doing recursive queries without the option of using the RECURSIVE clause
I’ve been working a lot with cloud databases lately. They say the best way to get good at a language is to learn by doing. The obvious problem is that if you don’t have access to a gigantic database of real-world data, toy databases will only get you so far. That’s why for this Bigquery tutorial, I was inspired by Michael Malis’ presentation on SQL Fractals. I figured it would be interesting to try this out with this in BigQuery. You can follow the code below either in the terminal or using the Google BigQuery Console.
The Fractals
Michael Malis’ SQL fractals made use of an SQL technique called recursive CTEs. Recursive CTEs are normally used to identify the hierarchies of data, such as an organizational structure, employee-manager, bill-of-materials, and document hierarchy. In this case, they provide a way to express fractal iteration in the queries.
With some work, Michael was able map fractals of two types into SQL queries:
- Escape-time fractals: A class of fractals based on repeatedly evaluating a formula at every point in a grid. In the original SQL formulation, use one recursive CTE to iterate through every point in a grid. I then use a second recursive CTE to repeatedly evaluate a function over each point to test whether or not the point is part of the fractal.
- Self-similar fractals: A class of fractals that can be described by what’s known as an ”L-system”:. An L-system encodes a fractal as an initial string and rules for modifying that string. It is then possible to decode the fractal from the string. I wrote a SQL query that takes an arbitrary L-system. It runs a few iterations of the L-system and then maps the resulting string back to the fractal. Mapping the L-system to the final fractal is a bit tedious, but not too hard.
This all seems pretty straightforward, but we have a problem.
While many relational databases (e.g., Teradata, Oracle, etc.) support recursive queries, Google BigQuery does not support features like Recursive CTE or VIEWS. By extention, Google BigQuery does not support either WITH RECURSIVE
clause or using the RECURSIVE
clause in a CREATE VIEW
statement.
Without these features, we’re going to need to get creative with some alternatives to RECURSIVE
.
Sierpinski’s Triangle.
Michael Malis’ original version
The Sierpinski’s Triangle is a fractal that will be familiar to most people. It’s probably the first fractal you learned to make on your TI-84 calculator, and has been the subject of far too many “Illuminati Confirmed” memes (give it a rest people, that meme’s getting old).
This falls into the category of escape-time fractals. The BigQuery query for our fractal is below, which you can ompare to the original:
WITH cells AS (
SELECT r, c
FROM UNNEST(GENERATE_ARRAY(0, 63)) AS r
CROSS JOIN UNNEST(GENERATE_ARRAY(0, 63)) AS c
),
marked_points AS (
SELECT r,
c,
CASE WHEN r & c = 0
THEN '**'
ELSE ' ' END AS marker
FROM cells
),
combined_rows AS (
SELECT string_agg(marker, '') AS row_text
FROM marked_points
GROUP BY r
ORDER BY r DESC
)
SELECT string_agg(row_text, '\n') FROM combined_rows;
…which outputs
**
****
** **
********
** **
**** ****
** ** ** **
****************
** **
**** ****
** ** ** **
******** ********
** ** ** **
**** **** **** ****
** ** ** ** ** ** ** **
********************************
** **
**** ****
** ** ** **
******** ********
** ** ** **
**** **** **** ****
** ** ** ** ** ** ** **
**************** ****************
** ** ** **
**** **** **** ****
** ** ** ** ** ** ** **
******** ******** ******** ********
** ** ** ** ** ** ** **
**** **** **** **** **** **** **** ****
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
****************************************************************
** **
**** ****
** ** ** **
******** ********
** ** ** **
**** **** **** ****
** ** ** ** ** ** ** **
**************** ****************
** ** ** **
**** **** **** ****
** ** ** ** ** ** ** **
******** ******** ******** ********
** ** ** ** ** ** ** **
**** **** **** **** **** **** **** ****
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
******************************** ********************************
** ** ** **
**** **** **** ****
** ** ** ** ** ** ** **
******** ******** ******** ********
** ** ** ** ** ** ** **
**** **** **** **** **** **** **** ****
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
**************** **************** **************** ****************
** ** ** ** ** ** ** **
**** **** **** **** **** **** **** ****
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
******** ******** ******** ******** ******** ******** ******** ********
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
**** **** **** **** **** **** **** **** **** **** **** **** **** **** **** ****
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
********************************************************************************************************************************
Mandelbrot Set
Michael Malis’ original version
The Mandelbrot set is the set of complex numbers for which the function does not diverge when iterated from , i.e., for which the sequence , , etc., remains bounded in absolute value.
Once again, this falls into the category of escape-time fractals. However this time, we need to find a more clever way of doing without the RECURSIVE
functionality we’d rely on with an SQL language like Postgres. The BigQuery query for our fractal is below, which you can ompare to the original:
WITH RECURSIVE points AS (
SELECT r, c FROM UNNEST(GENERATE_ARRAY(-2, 2, 0.05)) AS r
CROSS JOIN UNNEST(GENERATE_ARRAY(-2, 1, 0.05)) AS c
ORDER BY r DESC, c ASC
),
iterations AS (
SELECT r, c, CAST(0.0 as FLOAT64) AS zr, CAST(0.0 as FLOAT64) AS zc, 0 AS iteration FROM points
UNION ALL
SELECT r, c, zr*zr - zc*zc + c AS zr, 2*zr*zc + r AS zc, iteration+1 AS iteration
FROM iterations WHERE zr*zr + zc*zc < 4 AND iteration < 1000
),
final_iteration AS (
SELECT * FROM iterations WHERE iteration = 1000
),
marked_points AS (
SELECT r, c, (CASE WHEN EXISTS (SELECT 1 FROM final_iteration i WHERE p.r = i.r AND p.c = i.c)
THEN '**'
ELSE ' '
END) AS marker
FROM points p
ORDER BY r DESC, c ASC
),
lines AS (
SELECT r, string_agg(marker, '') AS r_text
FROM marked_points
GROUP BY r
ORDER BY r DESC
)
SELECT string_agg(r_text, '\n') FROM lines;
…which outputs
**
Julia Set Fractal
Michael Malis’ original version
The Julia Set fractal.
Now we’re getting into the territory of self-similar fractals, which will be a bit tricker than the escape-time fractals. Once again we need to find a way of doing without the RECURSIVE
functionality we’d rely on with an SQL language like Postgres. The BigQuery query for our fractal is below, which you can ompare to the original:
WITH RECURSIVE points AS (
SELECT ROW, col FROM UNNEST(GENERATE_ARRAY(-2, 2, 0.05)) AS a(ROW)
CROSS JOIN UNNEST(GENERATE_ARRAY(-2, 2, 0.05)) AS b(col)
ORDER BY ROW DESC, col ASC
),
iterations AS (
SELECT ROW, col, col::float AS zr, row::float AS zc, 0 AS iteration FROM points
UNION ALL
SELECT ROW, col, zr*zr - zc*zc + 1 - 1.61803398875 AS zr, 2*zr*zc AS zc, iteration+1 AS iteration
FROM iterations WHERE zr*zr + zc*zc < 4 AND iteration < 1000
),
final_iteration AS (
SELECT * FROM iterations WHERE iteration = 1000
),
marked_points AS (
SELECT ROW, col, (CASE WHEN EXISTS (SELECT 1 FROM final_iteration i WHERE p.ROW = i.ROW AND p.col = i.col)
THEN '**'
ELSE ' '
END) AS marker
FROM points p
ORDER BY ROW DESC, col ASC
),
ROWS AS (
SELECT ROW, string_agg(marker, '') AS row_text
FROM marked_points
GROUP BY row
ORDER BY ROW DESC
)
SELECT string_agg(row_text, '\n') FROM ROWS;
…which outputs
**
Hilbert Curve
Michael Malis’ original version
The Hilbert Curve is a continuous fractal space-filling curve first described by the German mathematician David Hilbert in 1891.
As a self-similar fractal, the learning curve is beginning to steepen. Once again we need to find a way of doing without the RECURSIVE
functionality we’d rely on with an SQL language like Postgres. The BigQuery query for our fractal is below, which you can ompare to the original:
WITH RECURSIVE iterations AS (
SELECT 'A' AS PATH, 0 AS iteration
UNION ALL
SELECT replace(replace(replace(PATH, 'A', '-CF+AFA+FC-'), 'B', '+AF-BFB-FA+'), 'C', 'B'), iteration+1 AS iteration
FROM iterations WHERE iteration < 3
),
segments AS (
SELECT 0 AS r1, 0 AS c1, 0 AS r2, 0 AS c2, 0 AS r3, 0 AS c3, 0 AS dr, 1 AS dc, (SELECT path FROM iterations ORDER BY iteration DESC LIMIT 1) AS path_left
UNION ALL
SELECT r3 AS r1, c3 AS c1, r3 + dr * movement AS r2, c3 + dc * movement AS c2, r3 + 2 * dr * movement AS r3, c3 + 2 * dc * movement AS c3, dr, dc, SUBSTRING(path_left FROM 2) AS path_left
FROM (
SELECT r1, c1, r2, c2, r3, c3,
CASE WHEN SUBSTRING(path_left FOR 1) = '-' THEN -dc
WHEN SUBSTRING(path_left FOR 1) = '+' THEN dc
ELSE dr
END AS dr,
CASE WHEN SUBSTRING(path_left FOR 1) = '-' THEN dr
WHEN SUBSTRING(path_left FOR 1) = '+' THEN -dr
ELSE dc
END AS dc,
path_left,
CASE WHEN SUBSTRING(path_left FOR 1) = 'F' THEN 1 ELSE 0 END AS movement
FROM segments
WHERE CHAR_LENGTH(path_left) > 0
) sub
),
end_points AS (SELECT r1 AS r, c1 AS c FROM segments UNION SELECT r3, c3 FROM segments),
points AS (
SELECT r, c FROM UNNEST(GENERATE_ARRAY((SELECT MIN(r) FROM end_points), (SELECT MAX(r) FROM end_points))) AS a(r)
CROSS JOIN UNNEST(GENERATE_ARRAY((SELECT MIN(c) FROM end_points), (SELECT MAX(c) FROM end_points))) AS b(c)
),
marked_points AS (
SELECT r, c, (CASE WHEN
EXISTS (SELECT 1 FROM end_points e WHERE p.r = e.r AND p.c = e.c)
THEN '*'
WHEN EXISTS (SELECT 1 FROM segments s WHERE p.r = s.r2 AND p.c = s.c2 AND dc != 0)
THEN '-'
WHEN EXISTS (SELECT 1 FROM segments s WHERE p.r = s.r2 AND p.c = s.c2 AND dr != 0)
THEN '|'
ELSE ' '
END
) AS marker
FROM points p
),
lines AS (
SELECT r, string_agg(marker, '') AS row_text
FROM marked_points
GROUP BY r
ORDER BY r DESC
)
SELECT string_agg(row_text, E'\n') FROM lines;
…which outputs
**
Compare to the original below:
Dragon Curve
The Dragon Curve
If there was a ‘hard mode’ for the self-similar fractal queries, this would be it. Once again we need to find a way of doing without the RECURSIVE
functionality we’d rely on with an SQL language like Postgres. The BigQuery query for our fractal is below, which you can compare to the original:
WITH RECURSIVE iterations AS (
SELECT 'FX' AS PATH, 0 AS iteration
UNION ALL
SELECT replace(replace(replace(PATH, 'X', 'X+ZF+'), 'Y', '-FX-Y'), 'Z', 'Y'), iteration+1 AS iteration
FROM iterations WHERE iteration < 8
),
segments AS (
SELECT 0 AS r1, 0 AS c1, 0 AS r2, 0 AS c2, 0 AS r3, 0 AS c3, 0 AS dr, 1 AS dc, (SELECT path FROM iterations ORDER BY iteration DESC LIMIT 1) AS path_left
UNION ALL
SELECT r3 AS r1, c3 AS c1, r3 + dr * movement AS r2, c3 + dc * movement AS c2, r3 + 2 * dr * movement AS r3, c3 + 2 * dc * movement AS c3, dr, dc, SUBSTRING(path_left FROM 2) AS path_left
FROM (
SELECT r1, c1, r2, c2, r3, c3,
CASE WHEN SUBSTRING(path_left FOR 1) = '-' THEN -dc
WHEN SUBSTRING(path_left FOR 1) = '+' THEN dc
ELSE dr
END AS dr,
CASE WHEN SUBSTRING(path_left FOR 1) = '-' THEN dr
WHEN SUBSTRING(path_left FOR 1) = '+' THEN -dr
ELSE dc
END AS dc,
path_left,
CASE WHEN SUBSTRING(path_left FOR 1) = 'F' THEN 1 ELSE 0 END AS movement
FROM segments
WHERE CHAR_LENGTH(path_left) > 0
) sub
),
end_points AS (
SELECT r1 AS r, c1 AS c FROM segments UNION SELECT r3, c3 FROM segments
),
points AS (
SELECT r, c FROM UNNEST(GENERATE_ARRAY((SELECT MIN(r) FROM end_points), (SELECT MAX(r) FROM end_points))) AS a(r)
CROSS JOIN UNNEST(GENERATE_ARRAY((SELECT MIN(c) FROM end_points), (SELECT MAX(c) FROM end_points))) AS b(c)
),
marked_points AS (
SELECT r, c, (CASE WHEN
EXISTS (SELECT 1 FROM end_points e WHERE p.r = e.r AND p.c = e.c)
THEN '*'
WHEN EXISTS (SELECT 1 FROM segments s WHERE p.r = s.r2 AND p.c = s.c2 AND dc != 0)
THEN '-'
WHEN EXISTS (SELECT 1 FROM segments s WHERE p.r = s.r2 AND p.c = s.c2 AND dr != 0)
THEN '|'
ELSE ' '
END
) AS marker
FROM points p
),
lines AS (
SELECT r, string_agg(marker, '') AS row_text
FROM marked_points
GROUP BY r
ORDER BY r DESC
)
SELECT string_agg(row_text, '\n') FROM lines;
…which outputs
**
Board
Our final self-similar fractal. Once again we need to find a way of doing without the RECURSIVE
functionality we’d rely on with an SQL language like Postgres. The BigQuery query for our fractal is below, which you can ompare to the original:
WITH RECURSIVE iterations AS (
SELECT 'F+F+F+F' AS PATH, 0 AS iteration
UNION ALL
SELECT replace(replace(replace(PATH, 'F', 'FF+F+F+F+FF'), 'R', '+LF-RFR-FL+'), 'Z', 'R'), iteration+1 AS iteration
FROM iterations WHERE iteration < 3
),
segments AS (
SELECT 0 AS r1, 0 AS c1, 0 AS r2, 0 AS c2, 0 AS r3, 0 AS c3, 0 AS dr, 1 AS dc, (SELECT path FROM iterations ORDER BY iteration DESC LIMIT 1) AS path_left
UNION ALL
SELECT r3 AS r1, c3 AS c1, r3 + dr * movement AS r2, c3 + dc * movement AS c2, r3 + 2 * dr * movement AS r3, c3 + 2 * dc * movement AS c3, dr, dc, SUBSTRING(path_left FROM 2) AS path_left
FROM (
SELECT r1, c1, r2, c2, r3, c3,
CASE WHEN SUBSTRING(path_left FOR 1) = '-' THEN -dc
WHEN SUBSTRING(path_left FOR 1) = '+' THEN dc
ELSE dr
END AS dr,
CASE WHEN SUBSTRING(path_left FOR 1) = '-' THEN dr
WHEN SUBSTRING(path_left FOR 1) = '+' THEN -dr
ELSE dc
END AS dc,
path_left,
CASE WHEN SUBSTRING(path_left FOR 1) = 'F' THEN 1 ELSE 0 END AS movement
FROM segments
WHERE CHAR_LENGTH(path_left) > 0
) sub
),
end_points AS (
SELECT r1 AS r, c1 AS c FROM segments UNION SELECT r3, c3 FROM segments
),
points AS (
SELECT r, c FROM UNNEST(GENERATE_ARRAY((SELECT MIN(r) FROM end_points), (SELECT MAX(r) FROM end_points))) AS a(r)
CROSS JOIN UNNEST(GENERATE_ARRAY((SELECT MIN(c) FROM end_points), (SELECT MAX(c) FROM end_points))) AS b(c)
),
marked_points AS (
SELECT r, c, (CASE WHEN
EXISTS (SELECT 1 FROM end_points e WHERE p.r = e.r AND p.c = e.c)
THEN '*'
WHEN EXISTS (SELECT 1 FROM segments s WHERE p.r = s.r2 AND p.c = s.c2 AND dc != 0)
THEN '-'
WHEN EXISTS (SELECT 1 FROM segments s WHERE p.r = s.r2 AND p.c = s.c2 AND dr != 0)
THEN '|'
ELSE ' '
END
) AS marker
FROM points p
),
lines AS (
SELECT r, string_agg(marker, '') AS row_text
FROM marked_points
GROUP BY r
ORDER BY r DESC
)
SELECT string_agg(row_text, '\n') FROM lines;
…which outputs
**
End Notes
This is a fun trick and all. Of course, when it comes to interesting code snippets there’s always a bigger fish. I’d recommend checking out Fabien’s implementation of a Turing Machine built with SQL queries.
Cited as:
@article{mcateer2019fracpost,
title = "Fractals in BigQuery",
author = "McAteer, Matthew",
journal = "matthewmcateer.me",
year = "2019",
url = "https://matthewmcateer.me/blog/fractals-in-bigquery/"
}
If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I would be very happy to correct them right away!
See you in the next post 😄