Oracle Database 23ai Vector Search in action


Oracle has just announced the release of Oracle Database 23ai, the next long-term support release of the industry-leading Oracle Database, on 2nd May 2024. You can find the announcement here. The new release is packed with tons of new features, strongly focusing on artificial intelligence (AI) and developer productivity. In this post, let’s have a look at the much awaited feature available in Oracle Database 23ai, the Vector Search functionality.

If you have not setup the Oracle Database 23ai yet, you can refer to my blog here, to check how you can download and install Oracle Database 23ai Free on Oracle VM or here, to check how to run it on a Docker.

So let’s jump into it, together.

Create Database User

This step is not mandatory, but I will create a separate user for this instead of using the HR schema, which comes out of the box with Oracle Database 23ai Free.

SQL> create user rishoradev identified by oracle DEFAULT TABLESPACE users TEMPORARY TABLESPACE temp QUOTA UNLIMITED ON users;

User RISHORADEV created.

SQL> grant DB_DEVELOPER_ROLE to rishoradev;

Grant succeeded.

Connect to the new user.

SQL> show user
USER is "RISHORADEV"

Create Table for storing Vectors

Now let us create a table called galaxies. The table stores the names of various galaxies, a short description, and the vector representation of the description retrieved through a language model. Note that we have used a VECTOR type embedding column.

CREATE TABLE galaxies (id NUMBER, name VARCHAR2(50), doc VARCHAR2(500), embedding VECTOR);

Now, insert the data. I have used the same data-set from the Oracle documentation.

INSERT INTO galaxies VALUES (1, 'M31', 'Messier 31 is a barred spiral galaxy in the Andromeda constellation which has a lot of barred spiral galaxies.', '[0,2,2,0,0]');
INSERT INTO galaxies VALUES (2, 'M33', 'Messier 33 is a spiral galaxy in the Triangulum constellation.', '[0,0,1,0,0]');
INSERT INTO galaxies VALUES (3, 'M58', 'Messier 58 is an intermediate barred spiral galaxy in the Virgo constellation.', '[1,1,1,0,0]');
INSERT INTO galaxies VALUES (4, 'M63', 'Messier 63 is a spiral galaxy in the Canes Venatici constellation.', '[0,0,1,0,0]');
INSERT INTO galaxies VALUES (5, 'M77', 'Messier 77 is a barred spiral galaxy in the Cetus constellation.', '[0,1,1,0,0]');
INSERT INTO galaxies VALUES (6, 'M91', 'Messier 91 is a barred spiral galaxy in the Coma Berenices constellation.', '[0,1,1,0,0]');
INSERT INTO galaxies VALUES (7, 'M49', 'Messier 49 is a giant elliptical galaxy in the Virgo constellation.', '[0,0,0,1,1]');
INSERT INTO galaxies VALUES (8, 'M60', 'Messier 60 is an elliptical galaxy in the Virgo constellation.', '[0,0,0,0,1]');
INSERT INTO galaxies VALUES (9, 'NGC1073', 'NGC 1073 is a barred spiral galaxy in Cetus constellation.', '[0,1,1,0,0]');

Vector Search in Action

Now let us calculate the distance between the galaxy M31 and other galaxies. We use the VECTOR_DISTANCE() function to calculate the distance.

SQL> SELECT
g1.name AS galaxy_1,
g2.name AS galaxy_2,
VECTOR_DISTANCE(g2.embedding, g1.embedding) AS distance
FROM galaxies g1, galaxies g2
WHERE g1.id = 1 and g2.id <> 1
ORDER BY distance ASC;

GALAXY_1 GALAXY_2 DISTANCE
___________ ___________ _____________________________
M31 M77 -0.00000011920928955078125
M31 M91 -0.00000011920928955078125
M31 NGC1073 -0.00000011920928955078125
M31 M58 0.1835033893585205
M31 M63 0.2928932309150696
M31 M33 0.2928932309150696
M31 M49 1.0
M31 M60 1.0

8 rows selected.

Notice that we did not pass any calculation metric to the VECTOR_DISTANCE, to calculate the similarity. By default the metric used is Cosine . You can also specify the metric.

VECTOR_DISTANCE(g2.embedding, g1.embedding, COSINE )

You can also use the shorthand operator <=> for Cosine searches.

SQL> SELECT
g1.name AS galaxy_1,
g2.name AS galaxy_2,
g2.embedding <=> g1.embedding AS distance
FROM galaxies g1, galaxies g2
WHERE g1.id = 1 and g2.id <> 1
ORDER BY distance ASC;

GALAXY_1 GALAXY_2 DISTANCE
___________ ___________ _____________________________
M31 M77 -0.00000011920928955078125
M31 M91 -0.00000011920928955078125
M31 NGC1073 -0.00000011920928955078125
M31 M58 0.1835033893585205
M31 M63 0.2928932309150696
M31 M33 0.2928932309150696
M31 M49 1.0
M31 M60 1.0

8 rows selected.

Other options available for the search are:

SQL> SELECT
g1.name AS galaxy_1,
g2.name AS galaxy_2,
VECTOR_DISTANCE(g2.embedding, g1.embedding, EUCLIDEAN) AS distance
FROM galaxies g1, galaxies g2
WHERE g1.id = 1 and g2.id <> 1
ORDER BY distance ASC;

GALAXY_1 GALAXY_2 DISTANCE
___________ ___________ _____________________
M31 M77 1.4142135381698608
M31 M91 1.4142135381698608
M31 NGC1073 1.4142135381698608
M31 M58 1.7320507764816284
M31 M63 2.2360680103302
M31 M33 2.2360680103302
M31 M60 3.0
M31 M49 3.1622776985168457

8 rows selected.

You can also use the shorthand operator <-> for Euclidean searches.

SQL> SELECT
g1.name AS galaxy_1,
g2.name AS galaxy_2,
g2.embedding <-> g1.embedding AS distance
FROM galaxies g1, galaxies g2
WHERE g1.id = 1 and g2.id <> 1
ORDER BY distance ASC;

GALAXY_1 GALAXY_2 DISTANCE
___________ ___________ _____________________
M31 M77 1.4142135381698608
M31 M91 1.4142135381698608
M31 NGC1073 1.4142135381698608
M31 M58 1.7320507764816284
M31 M63 2.2360680103302
M31 M33 2.2360680103302
M31 M60 3.0
M31 M49 3.1622776985168457

8 rows selected.
SQL> SQL> SELECT
g1.name AS galaxy_1,
g2.name AS galaxy_2,
VECTOR_DISTANCE(g2.embedding, g1.embedding, EUCLIDEAN_SQUARED) AS distance
FROM galaxies g1, galaxies g2
WHERE g1.id = 1 and g2.id <> 1
ORDER BY distance ASC;

GALAXY_1 GALAXY_2 DISTANCE
___________ ___________ ___________
M31 M77 2.0
M31 M91 2.0
M31 NGC1073 2.0
M31 M58 3.0
M31 M63 5.0
M31 M33 5.0
M31 M60 9.0
M31 M49 10.0

8 rows selected.
SQL> SELECT
g1.name AS galaxy_1,
g2.name AS galaxy_2,
VECTOR_DISTANCE(g2.embedding, g1.embedding, DOT) AS distance
FROM galaxies g1, galaxies g2
WHERE g1.id = 1 and g2.id <> 1
ORDER BY distance ASC;

GALAXY_1 GALAXY_2 DISTANCE
___________ ___________ ___________
M31 M77 -4.0
M31 M91 -4.0
M31 NGC1073 -4.0
M31 M58 -4.0
M31 M63 -2.0
M31 M33 -2.0
M31 M49 0.0
M31 M60 0.0

8 rows selected.

You can also use the shorthand operator <#> for Dot Product similarities.

SQL> SELECT
g1.name AS galaxy_1,
g2.name AS galaxy_2,
g2.embedding <#> g1.embedding AS distance
FROM galaxies g1, galaxies g2
WHERE g1.id = 1 and g2.id <> 1
ORDER BY distance ASC;

GALAXY_1 GALAXY_2 DISTANCE
___________ ___________ ___________
M31 M77 -4.0
M31 M91 -4.0
M31 NGC1073 -4.0
M31 M58 -4.0
M31 M63 -2.0
M31 M33 -2.0
M31 M49 0.0
M31 M60 0.0

8 rows selected.
SQL> SELECT
g1.name AS galaxy_1,
g2.name AS galaxy_2,
VECTOR_DISTANCE(g2.embedding, g1.embedding, MANHATTAN) AS distance
FROM galaxies g1, galaxies g2
WHERE g1.id = 1 and g2.id <> 1
ORDER BY distance ASC;

GALAXY_1 GALAXY_2 DISTANCE
___________ ___________ ___________
M31 M77 2.0
M31 M91 2.0
M31 NGC1073 2.0
M31 M63 3.0
M31 M58 3.0
M31 M33 3.0
M31 M60 5.0
M31 M49 6.0

8 rows selected.
SQL> SELECT
g1.name AS galaxy_1,
g2.name AS galaxy_2,
VECTOR_DISTANCE(g2.embedding, g1.embedding, HAMMING) AS distance
FROM galaxies g1, galaxies g2
WHERE g1.id = 1 and g2.id <> 1
ORDER BY distance ASC;

GALAXY_1 GALAXY_2 DISTANCE
___________ ___________ ___________
M31 M33 2.0
M31 M77 2.0
M31 M91 2.0
M31 NGC1073 2.0
M31 M63 2.0
M31 M60 3.0
M31 M58 3.0
M31 M49 4.0

8 rows selected.

Well, there you go, you have just learnt the basics of Vector Search. Happy learning!!!

References

Leave a comment