top of page
Kelly Adams

User Defined Functions in BigQuery

I spend a lot of time writing SQL queries as a data analyst, I work with almost daily. A lot of queries I write are repetitive and commonly used across reports. Before I had various SQL files with different queries that I copy and paste as needed.

But recently I’ve been experimenting with user defined functions (UDFs), and I’m seeing why they’re popular with professional data analysts. I’ll be going into what are user defined functions, why I’m using them, and an example.


Notes: 

  • I primarily work with BigQuery, a Google Cloud Platform product, so I’ll be using its syntax for this post.

  • This post is for educational purposes only and uses hypothetical data and examples. The information provided here does not reflect real business practices or actual data.


What is a User-Defined Function?

User-Defined Function (UDF) in BigQuery allows you to define custom logic that can be reused across multiple queries. It’s like creating your own SQL function, either with SQL or JavaScript. If you’re familiar with programming, it’s similar to creating a function in Python. 


Why Use UDFS? 

UDFs simplify repetitive tasks, improve maintainability, and reduce errors. Here’s a few reasons why I’d use them:

  • Consistency  - Use standardized functions for calculations across queries.

  • Easier Updates - Change logic in one place instead of updating dozens of queries.

  • Efficient - Write less SQL by creating a library of complex calculations.

  • Documentation - Centralize logic and definitions for better collaboration.


How to Write UDFs

There are two ways to write it user defined functions in BigQuery. 

  1. SQL-Based UDFs - These UDFs are written entirely in SQL and are best for straightforward calculations or logic. These are simple, fast and use SQL. 

  2. Javascript-Based UDFs - These UDFs use JavaScript to define more complex logic and are integrated into BigQuery. Better when you need more flexibility for complex calculations. 


When to Choose Which?

Choosing one depends on how complex the logic is and the specific use-case. But below we have some examples of use cases and a comparison between the two. 


Use Case

SQL UDF

JavaScript UDF

Simple calculations

✅ Ideal

❌ Overkill

Complex business logic

❌ Hard to implement

✅ Better suited

String or array manipulation

❌ Limited

✅ JavaScript excels here

Performance-critical queries

✅ Faster, optimized for BigQuery

❌ May be slower

Team familiarity with SQL

✅ Easy to maintain

❌ Requires JavaScript skills

Iterative or recursive logic

❌ Not possible

✅ Can handle iterations


How to Create UDFs

For this you’ll need to create your function first. 


SQL Based UDF

CREATE OR REPLACE FUNCTION `project_id.dataset_id.function_name`(
  param1 DATA_TYPE,
  param2 DATA_TYPE
)
RETURNS RETURN_TYPE AS (
  -- SQL Expression Here
);

Javascript Based UDF

CREATE OR REPLACE FUNCTION `project_id.dataset_id.function_name`(
  param1 DATA_TYPE,
  param2 DATA_TYPE
)
RETURNS RETURN_TYPE
LANGUAGE js AS """
  // JavaScript logic here
  return ...;
""";

Using UDFs in Queries

After creating the UDF, you can use it in your queries by calling with the correct parameters.


SQL Based UDF

SELECT
  column_name,
  function_name(param1, param2) AS result
FROM
  `project_id.dataset_id.table_name`;

Javascript Based UDF

SELECT
  column_name,
  function_name(param1, param2) AS result
FROM
  `project_id.dataset_id.table_name`;


Example: Customer Lifetime Value

Let’s say we calculate Customer Lifetime Value (CLV), which is a metric used to estimate the total revenue a business can expect from a single customer over their lifetime (time with the business). It’s a good metric to understand how much a business should invest in acquiring (e.g. paying for adverting) and retaining customers (e.g. customer service initiatives). The CLV is formula that’s used across multiple reports:


Formula

CLV = (average_order_value purchase_frequency customer_lifespan) - acquisition_cost

Table

We’ll be extracting the table from ecommerce_data.customers includes:

  • customer_id

  • average_order_value (average spend per order)

  • purchase_frequency (orders per year)

  • customer_lifespan (years as a customer)

  • acquisition_cost (cost to acquire the customer)

customer_id

average_order_value

purchase_frequency

customer_lifespan

acquisition_cost

1

50.0

2.0

5.0

20.0

2

70.0

1.5

4.0

30.0

3

100.0

3.0

6.0

40.0

Creating the Functions

SQL Based UDF

CREATE OR REPLACE FUNCTION project_id.dataset_id.calculate_clv(
  average_order_value FLOAT64,
  purchase_frequency FLOAT64,
  customer_lifespan FLOAT64,
  acquisition_cost FLOAT64
)
RETURNS FLOAT64 AS (
  (average_order_value * purchase_frequency * customer_lifespan) - acquisition_cost
);

Javascript Based UDF

CREATE OR REPLACE FUNCTION project_id.dataset_id.calculate_clv_js(
  average_order_value FLOAT64,
  purchase_frequency FLOAT64,
  customer_lifespan FLOAT64,
  acquisition_cost FLOAT64
)
RETURNS FLOAT64
LANGUAGE js AS """
  return (average_order_value * purchase_frequency * customer_lifespan) - acquisition_cost;
""";

Query Examples


SQL Based 

Here’s how you’d use it in an actual query. 

SELECT 
  customer_id,
  calculate_clv(
    average_order_value, 
    purchase_frequency, 
    customer_lifespan, 
    acquisition_cost
  ) AS clv
FROM 
  `project_id.dataset_id.customers`;

Javascript

SELECT 
  customer_id,
  calculate_clv_js(
    average_order_value, 
    purchase_frequency, 
    customer_lifespan, 
    acquisition_cost
  ) AS clv
FROM 
  `project_id.dataset_id.customers`;


Both return: 

customer_id

clv

1

480.0

2

390.0

3

1760.0

Without a UDF

If you didn’t have a UDF, the query would be written as:

SELECT
   customer_id,
   average_order_value * purchase_frequency * customer_lifespan) - acquisition_cost AS clv
FROM 
   `project_id.dataset_id.customers`;

Why This Helps


In the example above it might seem longer to write the UDF instead of just writing the equation for CLV. There are a few reasons why you’d want to do this: 

  • Instead of repeating the CLV formula across a bunch of queries, you use calculate_clv() or calculate_clv_js() wherever it’s needed, reducing redundancy.

  • If the marketing team changes the acquisition cost calculation, you only update the UDF, and all dependent queries automatically reflect the change. Instead of having to manually update each. 


How I’m Using It

I’m using both at my work

  • Javascript for our reports (which uses a platform called Dataform that uses a combination of SQL and Javascript) 

  • SQL Based for custom queries or one off requests

I’m just in the beginning stages of using it but as time progresses I’ll keep you updated.

 

Conclusion

User-Defined Functions in BigQuery are a powerful way to streamline workflows because it saves time writing and updating queries, reduces risk of errors, and centralizes definitions. I’m personally excited to see how it helps in my workflow. Whether you use SQL or JavaScript, UDFs help you write cleaner, more efficient SQL and ensure your queries are reliable and adaptable.

© Kelly J. Adams - 2024

  • LinkedIn
  • GitHub
bottom of page