Oracle SQL notes

Marc Deveaux
7 min readFeb 8, 2024

DenseRank and rank functions

Calculate a rank of a row in an ordered set of rows. Unlike the RANK() function, the DENSE_RANK() function returns rank values as consecutive integers. It does not skip rank in case of ties. Rows with the same values for the rank criteria will receive the same rank values.

https://www.oracletutorial.com/oracle-analytic-functions/oracle-dense_rank/

SELECT
product_name,
list_price,
RANK() OVER(ORDER BY list_price)
FROM products;

-- get the top 10 cheapest product
WITH cte_products AS(
SELECT
product_name,
list_price,
RANK() OVER(ORDER BY list_price) my_rank
FROM products
)
SELECT * FROM cte_products WHERE my_rank <= 10;

-- get top 5 cheapest products with partition by
WITH cte_products AS(
SELECT
product_name,
category_id,
list_price,
RANK() OVER (PARTITION BY category_id
ORDER BY list_price ) my_rank
FROM products
)
SELECT * FROM cte_products
WHERE my_rank <= 5;

Execution time in sql dev

The time indicated when you execute a query is not the real one, you would need to do ctrl + end on the query output to execute the real query and see the real execution time. Count(*) doesn’t work neither as the optimizer could take a different road

Dealing with Dates which are stored in VARCHAR in a WHERE condition

rule #1, no substr function in WHERE or JOIN.

-- don't do this
TO_DATE(SUBSTR(TO_CHAR(XXXDATETIME),1,8),'YYYYMMDD')"

-- slightly faster but still a bad idea
CAST(TO_TIMESTAMP(XXXDATETIME,'YYYYMMDDHH24MISSFF') AS DATE)

If you must filter on a WHERE and your VARCHAR format is YYYYDDMM… you can use between while keeping everything in string

-- best (after asking the DBA team to change the data type)
WITH PARAMS AS (
SELECT
'20231101000000000' AS START_DATE,
'20240127000000000' AS END_DATE
FROM DUAL
),
...
WHERE VARCHAR_DATETIME BETWEEN PARAMS.START_DATE AND PARAMS.END_DATE

ISO8601 for DateTime

-- to know your settings
SELECT *
FROM NLS_Session_Parameters
WHERE Parameter = 'NLS_TIMESTAMP_TZ_FORMAT'; -- NLS_DATE_FORMAT

-- change
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD HH24:MI:SS';

-- put in iso a varchar datetime col which looks like 20240105000140480
SELECT DATETIME
, TO_CHAR(FROM_TZ(TO_TIMESTAMP(TO_CHAR(DATETIME),'YYYYMMDDHH24MISSFF'),'Europe/Paris'),
'YYYY-MM-DD"T"HH24:MI:SS.FF3TZHTZM') AS B
FROM SOME_DB.SOME_TABLE;

Dealing with accents and special characters

-- technically the way to do it is with decompose accents
DECOMPOSE()

-- another workaround is
SELECT CONVERT('mercièèüöàä', 'US7ASCII') FROM DUAL

-- remove special characters
SELECT REGEXP_REPLACE('MTT..-lom','[^0-9A-Za-z]', ' ') FROM DUAL;

-- what about &? this is for substitution variable and if it is in your string
-- it will prompt for input. To avoid this you have to use before:
SET DEFINE OFF;

Listagg

Transform data from multiple rows into a single list of values separated by a specified delimiter

-- returns a comma-separated list of employees for each job title
SELECT
job_title,
LISTAGG( first_name, ',' )
WITHIN GROUP(ORDER BY first_name) AS employees
FROM employees
GROUP BY job_title

-- template to handle overflow
LISTAGG([ALL] column_name [, delimiter] ON OVERFLOW TRUNCATE
) WITHIN GROUP(ORDER BY sort_expression);

Sub query dates defined in dual

WITH PARAMS AS (
SELECT
TO_DATE('15.11.2023','DD.MM.YYYY') AS START_DATE,
TO_DATE('20.11.2023','DD.MM.YYYY') AS END_DATE
FROM DUAL
)
...
FROM XXXX
CROSS JOIN PARAMS -- or LEFT JOIN ON 1 = 1

Retrieve other tables names

SELECT 
col.column_id,
col.owner as schema_name,
col.table_name,
col.column_name,
col.data_type,
col.data_length,
col.data_precision,
col.data_scale,
col.nullable from sys.all_tab_columns col
-- where owner like '%DB_NAME%'
WHERE col.table_name like '%XXXX%' and col.column_name like '%NEXT%DATE%'

Pivot

SELECT * 
FROM SOME_TBL
PIVOT (SUM(XXXXX) FOR DTE
IN (${YESTERDAY} AS YESTERDAY,
${CURRENT_DATE} AS CURRENT_DAY)

Random sample

WITH AAA (
SELECT XXXX
, DBMS_RANDOM.VALUE rnd
FROM XXXX)
SELECT * FROM AAAA
WHERE rnd < 0.1 -- select 10% of the query

See indexes for a given table

SELECT * FROM all_indexes WHERE table_name = 'xxxx'
-- don't forget the quotation marks!

Column names convention

https://www.sqlstyle.guide/#overview

  • _id—a unique identifier such as a column that is a primary key.sq
  • _status—flag value or some other status of any type such as publication_status.
  • _total—the total or sum of a collection of values.
  • _num—denotes the field contains any kind of number.
  • _name—signifies a name such as first_name.
  • _seq—contains a contiguous sequence of values.
  • _date—denotes a column that contains the date of something.
  • _tally—a count.
  • _size—the size of something such as a file size or clothing.
  • _addr—an address for the record could be physical or intangible such as ip_addr.

Tips on optimizing queries

Sources:

Union and Union all

  • UNION removes duplicates and UNION ALL does not
  • In Oracle, an extra step is performed when using a UNION to remove all duplicate rows from the result set after it is combined. It’s the same as performing a DISTINCT

Avoid modifying indexed columns with functions…

Do not modify indexed columns using Oracle built-in functions like TO_DATE, TRIM, LOWER etc. Using those functions on an indexed column will disable the optimizer from identifying the index.

So more generally, avoid function in Where clause or try to use a function on the other side of condition and in the end.

If not possible, consider creating a function-based index on the column. This is a type of index that is created on the result of a function applied to the column, which could be used in this query. -> most likely we don’t have the right for this

https://www.oracletutorial.com/oracle-index/oracle-function-based-index/

-- A function-based index calculates the result of a function that involves one or more columns and stores that result in the index.
CREATE INDEX index_name
ON table_name (expression);

-- The following statement creates a function-based index based on the UPPER function:
CREATE INDEX members_last_name_fi
ON members(UPPER(last_name));

-- Now, if you find members by the last name, the query optimizer will consider the index as shown in the following query plan
EXPLAIN PLAN FOR
SELECT * FROM members
WHERE UPPER(last_name) = 'SANS';

SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY());

No functions in joins/complex joins

Functions like “SUBSTRING”, “REGEXP_SUBSTR”, “COALESCES” are not recommended in join and more broadly in where statement as well.

Similarly, do not join on TO_CHAR, use native column (or get bad performance). Also do not filter on TO_CHAR function, at least on the column side when the value to be filtered is fixed. You should filter on fixed value with TO_DATE function.

Focus on bind variable instead of literals

If you have the same query several times but with different values, use bind variable instead so that the optimizer doesn’t see it as different queries. “The use of literal values will cause many unique statements to be cached because each literal value is treated as different”

-- substitution variables can replace SQL*Plus command options or other hard-coded text
define value1 = 'sysdate'
SELECT &&value1 from dual;

-- bind variables store data values for SQL and PL/SQL statements executed in the RDBMS; they can hold single values or complete result sets
var x number;
exec :x := 10;
select :x from dual;

exec select count(*) into :x from dual;
exec print x;

Avoid using OR in join conditions

This results in a decrease of the queries’ speed for at least a factor of 2. It’s a far better solution to use UNION ALL operator and join two tables.

OUTER join tips

  • Do not put OUTER JOIN by default, only if necessary.
  • Do not use OUTER JOIN on composite key
  • Avoid having many sub-queries with LEFT OUTER JOIN

Use ANSI joins instead of Oracle joins

It turned out there is another way to join table which is specific to Oracle. Example for the curiosity as apparently it is a bad practice

-- oracle inner join
SELECT emp.*, dept.*
FROM emp, dept
WHERE emp.dept_id = dept.id;

-- oracle left join
SELECT emp.*, dept.*
FROM emp, dept
WHERE emp.dept_id = dept.id(+);

Use CASE Instead of Multiple Unions

-- bad
SELECT id, product_name FROM product
WHERE status = 'X' AND created_date < TO_DATE('2017-01-01', 'YYYY-MM-DD')
UNION ALL
SELECT id, product_name
FROM product
WHERE status = 'A' AND product_series = 'WXT';

-- good
SELECT id, product_name
FROM (
SELECT id, product_name,
CASE
WHEN status = 'X' AND created_date <

TO_DATE('2017-01-01', 'YYYY-MM-DD') THEN 1
WHEN status = 'A' AND product_series = 'WXT' THEN 1
ELSE 0 END AS prodcheck
FROM product
) sub
WHERE prodcheck = 1;

Minimize the use of DISTINCT

If your result set is displaying data from many different tables, you might end up getting some duplicate results. It can be tempting to add a DISTINCT keyword to ensure you don’t get duplicate records. But adding a DISTINCT keyword will likely cause an expensive operation to be performed on your query, slowing it down. It will give you the results you need, but it’s masking a problem elsewhere

Avoid Use of the System Dual Table

Use of the system DUAL table for returning the sysdate or other constants is inefficient and should be avoided when not required.

-- instead of
SELECT DEPARTMENT_ID ID, (SELECT SYSDATE FROM DUAL) TODAYS_DATE FROM DEPARTMENTS WHERE DEPARTMENT_ID IN (:P_DEPT_ID)
-- do
SELECT DEPARTMENT_ID ID, SYSDATE TODAYS_DATE FROM DEPARTMENTS WHERE DEPARTMENT_ID IN (:P_DEPT_ID)

Avoid Nested Data Sets

The data model provides a mechanism to create parent-child hierarchy by linking elements from one data set to another. At run time, the data processor executes the parent query and for each row in the parent executes the child query. When a data model has many nested parent-child relationships slow processing can result.

A better approach to avoid nested data sets is to combine multiple data set queries into a single query using the WITH clause.

-- Query Q1:
SELECT DEPARTMENT_ID EDID, EMPLOYEE_ID EID, FIRST_NAME FNAME, LAST_NAME LNAME, SALARY SAL, COMMISSION_PCT COMM
FROM EMPLOYEES

-- Query Q2:
SELECT DEPARTMENT_ID DID, DEPARTMENT_NAME DNAME, LOCATION_ID LOC
FROM DEPARTMENTS

-- use with
WITH Q1 as (SELECT DEPARTMENT_ID DID,DEPARTMENT_NAME DNAME,LOCATION_ID LOC
FROM DEPARTMENTS),
Q2 as (SELECT DEPARTMENT_ID EDID,EMPLOYEE_ID EID,FIRST_NAME FNAME,LAST_NAME LNAME,SALARY SAL,COMMISSION_PCT COMM
FROM EMPLOYEES)

SELECT Q1.*, Q2.*
FROM Q1 LEFT JOIN Q2
ON Q1.DID=Q2.EDID

COALESCES instead of CASE with null

-- to avoid
CASE WHEN COL1 IS NULL THEN COL2

-- use instead
COASLESCE(COL1, COL2)

Avoid using “IN” with subqueries into your SQL query

  • The goal is to replace the “IN” WHERE clause by an INNER JOIN instead
-- AVOID IN + subquery
...
WHERE A_ID IN (
SELECT max(A_ID)
FROM XXX
WHERE ...
)
-- Replace by
...
INNER JOIN (
SELECT max(A_ID) AS MAX_A_ID
FROM XXX
WHERE ...
) SUB ON XXX.A_ID = SUB.MAX_A_ID
WHERE
...

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Marc Deveaux
Marc Deveaux

No responses yet

Write a response