I spend a lot of time writing SQL queries as a data analyst, I work with almost daily. A lot of queries I write are repetitive and commonly used across reports. Before I had various SQL files with different queries that I copy and paste as needed.
But recently I’ve been experimenting with user defined functions (UDFs), and I’m seeing why they’re popular with professional data analysts. I’ll be going into what are user defined functions, why I’m using them, and an example.
Notes:
I primarily work with BigQuery, a Google Cloud Platform product, so I’ll be using its syntax for this post.
This post is for educational purposes only and uses hypothetical data and examples. The information provided here does not reflect real business practices or actual data.
What is a User-Defined Function?
A User-Defined Function (UDF) in BigQuery allows you to define custom logic that can be reused across multiple queries. It’s like creating your own SQL function, either with SQL or JavaScript. If you’re familiar with programming, it’s similar to creating a function in Python.
Why Use UDFS?
UDFs simplify repetitive tasks, improve maintainability, and reduce errors. Here’s a few reasons why I’d use them:
Consistency - Use standardized functions for calculations across queries.
Easier Updates - Change logic in one place instead of updating dozens of queries.
Efficient - Write less SQL by creating a library of complex calculations.
Documentation - Centralize logic and definitions for better collaboration.
How to Write UDFs
There are two ways to write it user defined functions in BigQuery.
SQL-Based UDFs - These UDFs are written entirely in SQL and are best for straightforward calculations or logic. These are simple, fast and use SQL.
Javascript-Based UDFs - These UDFs use JavaScript to define more complex logic and are integrated into BigQuery. Better when you need more flexibility for complex calculations.
When to Choose Which?
Choosing one depends on how complex the logic is and the specific use-case. But below we have some examples of use cases and a comparison between the two.
Use Case | SQL UDF | JavaScript UDF |
Simple calculations | ✅ Ideal | ❌ Overkill |
Complex business logic | ❌ Hard to implement | ✅ Better suited |
String or array manipulation | ❌ Limited | ✅ JavaScript excels here |
Performance-critical queries | ✅ Faster, optimized for BigQuery | ❌ May be slower |
Team familiarity with SQL | ✅ Easy to maintain | ❌ Requires JavaScript skills |
Iterative or recursive logic | ❌ Not possible | ✅ Can handle iterations |
How to Create UDFs
For this you’ll need to create your function first.
SQL Based UDF
CREATE OR REPLACE FUNCTION `project_id.dataset_id.function_name`(
param1 DATA_TYPE,
param2 DATA_TYPE
)
RETURNS RETURN_TYPE AS (
-- SQL Expression Here
);
Javascript Based UDF
CREATE OR REPLACE FUNCTION `project_id.dataset_id.function_name`(
param1 DATA_TYPE,
param2 DATA_TYPE
)
RETURNS RETURN_TYPE
LANGUAGE js AS """
// JavaScript logic here
return ...;
""";
Using UDFs in Queries
After creating the UDF, you can use it in your queries by calling with the correct parameters.
SQL Based UDF
SELECT
column_name,
function_name(param1, param2) AS result
FROM
`project_id.dataset_id.table_name`;
Javascript Based UDF
SELECT
column_name,
function_name(param1, param2) AS result
FROM
`project_id.dataset_id.table_name`;
Example: Customer Lifetime Value
Let’s say we calculate Customer Lifetime Value (CLV), which is a metric used to estimate the total revenue a business can expect from a single customer over their lifetime (time with the business). It’s a good metric to understand how much a business should invest in acquiring (e.g. paying for adverting) and retaining customers (e.g. customer service initiatives). The CLV is formula that’s used across multiple reports:
Formula
CLV = (average_order_value purchase_frequency customer_lifespan) - acquisition_cost
Table
We’ll be extracting the table from ecommerce_data.customers includes:
customer_id
average_order_value (average spend per order)
purchase_frequency (orders per year)
customer_lifespan (years as a customer)
acquisition_cost (cost to acquire the customer)
customer_id | average_order_value | purchase_frequency | customer_lifespan | acquisition_cost |
1 | 50.0 | 2.0 | 5.0 | 20.0 |
2 | 70.0 | 1.5 | 4.0 | 30.0 |
3 | 100.0 | 3.0 | 6.0 | 40.0 |
Creating the Functions
SQL Based UDF
CREATE OR REPLACE FUNCTION project_id.dataset_id.calculate_clv(
average_order_value FLOAT64,
purchase_frequency FLOAT64,
customer_lifespan FLOAT64,
acquisition_cost FLOAT64
)
RETURNS FLOAT64 AS (
(average_order_value * purchase_frequency * customer_lifespan) - acquisition_cost
);
Javascript Based UDF
CREATE OR REPLACE FUNCTION project_id.dataset_id.calculate_clv_js(
average_order_value FLOAT64,
purchase_frequency FLOAT64,
customer_lifespan FLOAT64,
acquisition_cost FLOAT64
)
RETURNS FLOAT64
LANGUAGE js AS """
return (average_order_value * purchase_frequency * customer_lifespan) - acquisition_cost;
""";
Query Examples
SQL Based
Here’s how you’d use it in an actual query.
SELECT
customer_id,
calculate_clv(
average_order_value,
purchase_frequency,
customer_lifespan,
acquisition_cost
) AS clv
FROM
`project_id.dataset_id.customers`;
Javascript
SELECT
customer_id,
calculate_clv_js(
average_order_value,
purchase_frequency,
customer_lifespan,
acquisition_cost
) AS clv
FROM
`project_id.dataset_id.customers`;
Both return:
customer_id | clv |
1 | 480.0 |
2 | 390.0 |
3 | 1760.0 |
Without a UDF
If you didn’t have a UDF, the query would be written as:
SELECT
customer_id,
average_order_value * purchase_frequency * customer_lifespan) - acquisition_cost AS clv
FROM
`project_id.dataset_id.customers`;
Why This Helps
In the example above it might seem longer to write the UDF instead of just writing the equation for CLV. There are a few reasons why you’d want to do this:
Instead of repeating the CLV formula across a bunch of queries, you use calculate_clv() or calculate_clv_js() wherever it’s needed, reducing redundancy.
If the marketing team changes the acquisition cost calculation, you only update the UDF, and all dependent queries automatically reflect the change. Instead of having to manually update each.
How I’m Using It
I’m using both at my work
Javascript for our reports (which uses a platform called Dataform that uses a combination of SQL and Javascript)
SQL Based for custom queries or one off requests
I’m just in the beginning stages of using it but as time progresses I’ll keep you updated.
Conclusion
User-Defined Functions in BigQuery are a powerful way to streamline workflows because it saves time writing and updating queries, reduces risk of errors, and centralizes definitions. I’m personally excited to see how it helps in my workflow. Whether you use SQL or JavaScript, UDFs help you write cleaner, more efficient SQL and ensure your queries are reliable and adaptable.