Sometimes using the core functions are just not enough. Sometimes we need more power, we need the ability to hook in custom code and custom business logic.

Spectral does a great job with custom functions, So, vacuum has adopted a very similar design, to facilitate custom functions.

Since v0.3.0 vacuum supports JavaScript based functions as well as golang based functions.


vacuum is written in golang. When building a JavaScript function, we need to remember that the JavaScript code is being read in by vacuum and then parsed and executed by the goja JavaScript engine.

There is no DOM available and there is limited access to node.js APIs. This is not a v8 environment.

Since v0.22.0, vacuum supports async functions and Promises, including the built-in fetch() API for making HTTP requests. You can now write async function runRule(input) and use await inside your custom functions.


Structure of a JavaScript function

vacuum expects two required functions from a JavaScript custom function:

  1. A getSchema() function that returns metadata about the function
  2. A runRule() function that contains the actual validation logic (can be async)

Here’s the basic structure:

// Required: Define the function metadata
function getSchema() {
    return {
        "name": "myCustomFunction",  // This name is used in rulesets
        "description": "Validates that input matches expected value"
    };
}

// Required: Implement the validation logic  
function runRule(input) {
 // do something useful with custom logic, this is just a silly example 
 if (input !== 'some-value') {
   return [
     { message: 'something went wrong, input does not match "some-value"' }
   ];
 } 
}

In the above example, the runRule function accepts a single argument called input. This is the value of the given property that was located by the rule.

If the input does not match some-value then the function returns an array of objects, each object is a modelRuleFunctionResult that contains a message property. This is the message that will be displayed in the linting report.

The input argument

The input can be either a primitive value, or a complex value (like an object or an array). It all depends on how the rule is configured to use the function, and what the given property is set to.

If the function only works on a single value, then the given property should be set to a JSON Path that points to a single value. If the function needs to work on an array of values, then the given property should be set to a JSON Path that points to an array, etc.


Access to context

If the function needs to know what rule is calling it, or what the specification looks like, then the runRule function can access the context object. This is essentially a JSON rendered version of model.RuleFunctionContext.

There are two objects that are not available to JavaScript functions, the index and the document. The index is not available, because it’s a complex struct with a lot of data in it and a large API. Without a full replacement SDK, we can’t make this available to JavaScript functions.

The document is not available, because of the same reason as the index.

Example function that uses context

function runRule(input) {
 // use context to determine the rule name and the given path
 if (input !== 'some-value') {
   return [
     { message: 'rule' + context.rule.id + ' failed at ' + context.given }
   ];
 } 
}

The context object is automatically available to the function as a global. It does not need to be declared as an argument to the runRule function.

Context type

TypeScript is not (yet) supported, but here is the type definition for the context and related objects would be.

interface Context {
  ruleAction: RuleAction;
  rule: Rule;
  given: any;
  options: any;
  specInfo: SpecInfo;
}

SpecInfo type

interface SpecInfo {
  fileType: string;
  format: string;
  type: string;
  version: string;
}

RuleAction type

interface RuleAction {
  field: string;
  functionOptions: any;
}

Rule type

interface Rule {
  id: string;
  description: string;
  given: any;
  formats: string[];
  resolved: boolean;
  recommended: boolean;
  type: string;
  severity: string;
  then: any;
  ruleCategory: RuleCategory
  howToFix: string;
}

RuleCategory type

interface RuleCategory {
  id:          string;
  name:        string;
  description: string;
}

Access to the function options

All functions can accept options from the rule that is calling them. Options are available from functionOptions value. To access it, start with the context object, and its under the ruleAction property.

Example rule and function

Let’s configure a ruleset that defines a custom function and passes in some function options.

rules:
  my-custom-js-rule:
    description: "adding function options to use in JS functions."
    given: $
    severity: error
    then:
      function: useFunctionOptions
      field: someField
      functionOptions:
         someOption: someValue

Now we can create a new file called useFunctionOptions.js and write the function logic.

// Required: Define the function metadata
function getSchema() {
    return {
        "name": "useFunctionOptions",
        "description": "Demonstrates using function options"
    };
}

// Required: Implement the validation logic
function runRule(input) {
    // extract function options from context
    const functionOptions = context.ruleAction.functionOptions

    // check if the 'someOption' value is set in our options
    if (functionOptions.someOption) {
        return [
            {
                message: "someOption is set to " + functionOptions.someOption,
            }
        ];
    } else {
        return [
            {
                message: "someOption is not set",
            }
        ];
    }
}

The message will result in: someOption is set to someValue

The given value in a ruleset is a JSON Path.

The getSchema() function (Required)

Since vacuum v0.17.x, the getSchema() function is required for JavaScript custom functions. Without it, your function will not be registered and cannot be used in rulesets.

The getSchema() function returns an object that defines metadata about your custom function, including the name that will be used to reference it in rulesets.

For example:

function getSchema() {
    return {
        "name": "myFunctionName",  // IMPORTANT: This is the name used in rulesets
        "description": "What this function does",
        "properties": [  // Optional: Define expected properties
            {
                "name": "propertyName",
                "description": "Description of this property"
            }
        ],
    };
}

Important: The name property in the schema is what you’ll use to reference the function in your rulesets. For example, if your schema returns "name": "checkApiVersion", you would use function: checkApiVersion in your ruleset (not the filename).


Calling core functions

Any custom JavaScript function can call any of the core functions.

This is done by adding a prefix vacuum_ to the function name. For example, to call the truthy function, the function name would be vacuum_truthy. Another function like schema would be vacuum_schema.

Each core function accepts two arguments, the first is the input value, and the second is the context object.

For example:

function runRule(input) {

    // create an array to hold the results
    let results = vacuum_truthy(input, context);

    results.push({
        message: "this is a message, added after truthy was called",
    });

    // return results.
    return results;
}

Tutorial

Make sure you have vacuum installed first.

Setting things up

The create a new directory for your functions, and change into it.

mkdir my-functions && cd my-functions

Next, build a custom JavaScript function that will be used by vacuum. This is a simple function that checks if there is more than one path in an OpenAPI document.

Implement the logic

Create a new javascript file called checkSinglePath.js and write the function logic.

vi checkSinglePath.js
// Required: Define the function metadata
function getSchema() {
    return {
        "name": "checkSinglePath",  // This name will be used in rulesets
        "description": "Checks that only a single path exists in the API"
    };
}

// Required: Implement the validation logic
function runRule(input) {
  // get the number of keys passed in (each path is a key)
  const numPaths = Object.keys(input).length;
  if (numPaths > 1) {
    return [
      {
        message: 'More than a single path exists, there are ' + numPaths
      }
    ];
  }
}

That’s it! We’re ready to call the function from a rule.

Configuring functions

To use the function will need to call it from a rule. Create a new RuleSet or update an existing one, to include a new rule that calls the new custom function.

Example RuleSet

extends: [[vacuum:oas, off]]
documentationUrl: https://quobix.com/vacuum/rulesets/custom-rulesets
rules:
  sample-paths-rule:
    description: Load a custom function that checks for a single path
    severity: error
    recommended: true
    formats: [ oas2, oas3 ]
    given: $.paths
    then:
      function: checkSinglePath
    howToFix: use a spec with only a single path defined.

It’s critical that the function property in your ruleset’s then object matches the name property returned by your function’s getSchema() method. The filename is no longer used for function registration.

Run vacuum with ‘-f’

To run custom functions, vacuum needs to know where to look for them. There is a global flag -f or --functions that specifies a path to where your custom function plugin is located.

vacuum lint -r my-ruleset.yaml -f ./my-functions my-openapi-spec.yaml

There should be message informing that vacuum has located a function plugin, and how many functions were loaded.

Example output

When the rule runs, if there is more than a single path in the OpenAPI document, then the function will return a message.

For example:

More than a single path exists, there are 23

Async Functions & Promises

In v0.22.0, vacuum added support for asynchronous JavaScript functions using async/await syntax. It’s actually pretty cool; I needed to build a custom event loop to power goja. You can see it here

This enables custom functions to perform time-delayed operations or handle any other asynchronous work.

Writing an Async Function

To write an async custom function, simply declare runRule as an async function:

function getSchema() {
    return {
        name: "myAsyncFunction",
        description: "An async function that does something asynchronous"
    };
}

async function runRule(input) {
    // You can use await inside async functions
    const response = await fetch("https://api.example.com/validate");
    const data = await response.json();

    if (!data.valid) {
        return [{
            message: "Validation failed: " + data.reason
        }];
    }

    return [];
}

The key differences:

  • Use async function runRule(input) instead of function runRule(input)
  • You can use await to wait for Promises to resolve
  • The function automatically returns a Promise

fetch() API

Like async functions, in v0.22 vacuum added a built-in implementation of the Web Fetch API for making HTTP requests from custom JavaScript functions. This follows the standard Fetch API with security restrictions to protect against malicious usage.

Basic Usage

async function runRule(input) {
    // Simple GET request
    const response = await fetch("https://api.example.com/data");

    // Check if request was successful
    if (!response.ok) {
        console.log("Request failed with status: " + response.status);
        return [];
    }

    // Parse JSON response
    const data = await response.json();

    // Use the data for validation
    if (data.someField !== input.expectedValue) {
        return [{
            message: "Value mismatch: expected " + data.someField
        }];
    }

    return [];
}

fetch() Options

The fetch() function accepts an optional second parameter with request options:

const response = await fetch("https://api.example.com/endpoint", {
    method: "POST",           // HTTP method: GET, POST, PUT, DELETE, etc.
    headers: {                // Custom headers
        "Content-Type": "application/json",
        "Authorization": "Bearer token123"
    },
    body: JSON.stringify({    // Request body (for POST/PUT)
        key: "value"
    }),
    redirect: "follow"        // Redirect mode: "follow", "error", or "manual"
});

Response Object

The Response object returned by fetch() provides several methods and properties:

const response = await fetch("https://api.example.com/data");

// Properties
console.log(response.ok);         // true if status 200-299
console.log(response.status);     // HTTP status code (200, 404, etc.)
console.log(response.statusText); // Status text ("OK", "Not Found", etc.)
console.log(response.url);        // Final URL (after redirects)
console.log(response.redirected); // true if redirected

// Methods to read response body (each can only be called once)
const json = await response.json();  // Parse as JSON
const text = await response.text();  // Read as plain text

// Access headers
const contentType = response.headers.get("Content-Type");

Security Restrictions

For security reasons, fetch() has several important restrictions:

  • Only https:// URLs are allowed unless you enable HTTP mode with --allow-http
  • Requests to localhost, 10.x.x.x, 192.168.x.x, etc. are blocked without --allow-private-networks
  • Requests timeout after 30 seconds by default unless changed with --fetch-timeout
  • TLS certificate authentication will be enforced unless modified by --insecure
  • Response bodies are limited to 10MB

To override these restrictions:

# Allow HTTP (non-HTTPS) URLs (not recommended for production)
vacuum lint spec.yaml --allow-http

# Skip TLS certificate verification for HTTPS requests (different from --allow-http)
vacuum lint spec.yaml --insecure

# Allow requests to private/local networks
vacuum lint spec.yaml --allow-private-networks

# Set custom timeout (in seconds)
vacuum lint spec.yaml --fetch-timeout 60

Important distinction:

  • --allow-http allows fetch() to use HTTP (non-HTTPS) URLs
  • --insecure skips TLS certificate verification for HTTPS requests

These flags serve different purposes and can be used independently

Complete Example

I built a fun little API, called the Sentiment Analyzer to demonstrate the power of this new feature.

It takes in descriptions and decides if it’s too sad. If so, it returns a response with tooSad: true being set.

This example validates that API descriptions aren’t overly negative by calling an external sentiment analysis API, you can see the ruleset on github as well as the sentiment check example source.

Note: This example uses per-node mode (the default). For better performance with multiple descriptions, see the Batch Mode section below which reduces 29 API calls to just 1.

This is an actual working API. It uses the VADER analysis technique, except does it much faster, as it’s implemented in pure go.

function getSchema() {
    return {
        name: "sentimentCheck",
        description: "Checks if API descriptions are too negative using sentiment analysis"
    };
}

async function runRule(input) {
    // skip empty or non-string inputs
    if (!input || typeof input !== "string") {
        return [];
    }

    // call sentiment analysis API
    const response = await fetch("https://api.pb33f.io/sentiment-check", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
            text: input
        })
    });

    if (!response.ok) {
        console.log("Sentiment API error: " + response.status);
        return [];
    }

    const data = await response.json();

    // check if sentiment is too negative
    if (data.sentiment === "negative" && data.score < -0.5) {
        const words = data.negativeWords || [];
        const wordList = words.map(function(w) {
            return "'" + w + "'";
        }).join(", ");

        let message = "Description is too negative";
        if (wordList) {
            message += ", words like " + wordList + " lower the overall tone";
        }

        return [{ message: message }];
    }

    return [];
}

You can run this yourself. The API is real, if you check out the code and run this from the vacuum root.

vacuum lint model/test_files/sad-spec.yaml -r rulesets/examples/sentiment-check.yaml -f plugin/sample/js -d

You will see custom javascript, using a real HTTP API, configured by a ruleset and returning results as violations.


Error Handling

It’s simple to handle fetch errors:

async function runRule(input) {
    try {
        const response = await fetch("https://api.example.com/data");

        if (!response.ok) {
            // Handle HTTP errors (404, 500, etc.)
            console.log("API returned status: " + response.status);
            return [];
        }

        const data = await response.json();
        // Process data...

    } catch (error) {
        // Handle network errors, timeouts, etc.
        console.log("Fetch error: " + error.message);
        return [];
    }
}

Fetch Configuration via CLI

The following global flags control fetch() behavior:

  • --allow-http - Allow fetch() to use HTTP (non-HTTPS) URLs (not recommended for production)
  • --insecure - Skip TLS certificate verification for HTTPS requests (different from –allow-http)
  • --allow-private-networks - Allow fetch() to access localhost and private IP ranges (10.x, 192.168.x, etc.)
  • --fetch-timeout - Set timeout for fetch() requests in seconds (default: 30)
  • --cert-file, --key-file, --ca-file - Configure TLS certificates for HTTPS requests

Note: --allow-http and --insecure serve different purposes:

  • Use --allow-http to allow HTTP URLs (e.g., http://example.com)
  • Use --insecure to skip certificate verification for HTTPS URLs with invalid/self-signed certificates

Example: vacuum lint spec.yaml -f ./functions --allow-private-networks --fetch-timeout 60


Batch Mode Processing

Since v0.22.0, vacuum supports batch mode for custom JavaScript functions. This allows a single function call to process ALL JSONPath matches at once, instead of calling the function separately for each match.

Batch mode results MUST include the original input object. If you omit the input field or provide an invalid input object, vacuum will generate an error result explaining this requirement.

This is essential to allow vacuum to know how to map back your custom violation, to the node located via the given path.

return [{ message: "error", input: inputs[i] }];

Batch mode is only available to custom javascript functions. The same concept does not apply to go custom functions.

When to Use Batch Mode

Batch mode is ideal for functions that:

  • Call External APIs
  • Have High Overhead
  • Need Context Across Matches
  • Rate-Limited Services

Enabling Batch Mode

To enable batch mode, add batch: true to the functionOptions in your ruleset:

rules:
  my-batch-rule:
    description: Process all matches in a single batch
    given: $..description
    severity: warn
    then:
      function: myBatchFunction
      functionOptions:
        batch: true  # enable batch mode

Batch Mode Function Signature

When batch mode is enabled, the runRule function receives an array of objects instead of a single value:

async function runRule(inputs) {
    // inputs is an array: [{value: "...", index: 0}, {value: "...", index: 1}, ...]

    // each input object has:
    // - value: The matched value from the JSONPath
    // - index: The position in the match list (0-based)
}

Batch Mode Return Format

Functions in batch mode MUST return an array of result objects that include the original input object:

async function runRule(inputs) {
    // store inputs for later mapping
    var results = [];

    // process inputs...
    for (var i = 0; i < inputs.length; i++) {
        if (hasError(inputs[i].value)) {
            results.push({
                message: "Error found",
                input: inputs[i]  // REQUIRED: must include the original input object
            });
        }
    }

    return results;
}

Try It Yourself: Batch vs Per-Node Mode

Vacuum includes working examples you can run to see the difference between batch and per-node modes. Both examples use the Sentiment Analyzer API to check if API descriptions are too negative.

Run the per-node mode example (29 API calls, ~3 seconds):

vacuum lint model/test_files/sad-spec.yaml \
  -r rulesets/examples/sentiment-check.yaml \
  -f plugin/sample/js

Run the batch mode example (1 API call, ~100 milliseconds):

vacuum lint model/test_files/sad-spec.yaml \
  -r rulesets/examples/sentiment-check-batch.yaml \
  -f plugin/sample/js

Both produce identical results (18 warnings), but batch mode is ~33x faster because it makes a single API call instead of 29 separate calls.

Batch Mode Example Files

The example files are included in the vacuum repository:

  • Batch function: plugin/sample/js/sentiment_check_batch.js
  • Batch ruleset: rulesets/examples/sentiment-check-batch.yaml
  • Per-node function: plugin/sample/js/sentiment_check.js
  • Per-node ruleset: rulesets/examples/sentiment-check.yaml
  • Test spec: model/test_files/sad-spec.yaml

Batch Mode Ruleset Configuration

extends: [[vacuum:oas, off]]
rules:
  sentiment-check-batch:
    description: Check if API descriptions are too negative (batch mode)
    severity: warn
    formats: [oas2, oas3]
    given: $..description
    then:
      function: sentimentCheckBatch
      functionOptions:
        batch: true  # Process all descriptions at once

Batch Mode Function

function getSchema() {
    return {
        name: "sentimentCheckBatch",
        description: "Checks if API descriptions are too negative using sentiment analysis (batch mode)"
    };
}

async function runRule(inputs) {
    // build request body and a lookup map: id -> original input object
    var requestBody = [];
    var idToInput = {};

    for (var i = 0; i < inputs.length; i++) {
        var input = inputs[i];
        if (input.value && typeof input.value === "string") {
            requestBody.push({
                id: input.index,
                description: input.value
            });
            // store original input for result mapping (REQUIRED for batch mode)
            idToInput[input.index] = input;
        }
    }

    // nothing to do
    if (requestBody.length === 0) {
        return [];
    }

    // single API call with all descriptions
    var response = await fetch("https://api.pb33f.io/sentiment-check", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify(requestBody)
    });

    if (!response.ok) {
        console.log("Sentiment API error: " + response.status);
        return [];
    }

    var data = await response.json();

    if (!data.results || data.results.length === 0) {
        return [];
    }

    // map API results back to vacuum results
    var results = [];
    for (var j = 0; j < data.results.length; j++) {
        var result = data.results[j];

        // only report violations
        if (!result.tooSad) {
            continue;
        }

        // format negative words
        var words = result.negativeWords || [];
        var wordList = words.map(function(w) { return "'" + w + "'"; }).join(", ");

        var message = "description is too negative";
        if (wordList) {
            message += ", words like " + wordList + " are lowering your overall tone";
        }

        // REQUIRED: return result with original input object for deterministic node mapping
        // vacuum uses input.index to map back to the correct node
        results.push({
            message: message,
            input: idToInput[result.id]  // maps back to the original node
        });
    }

    return results;
}

Migrating to Batch Mode

If you have an existing per-node function that calls an external API:

Before (Per-Node):

async function runRule(input) {
    const response = await fetch("https://api.example.com/check", {
        method: "POST",
        body: JSON.stringify({ text: input })
    });

    const result = await response.json();
    if (result.hasError) {
        return [{ message: result.error }];
    }
    return [];
}

After (Batch Mode):

async function runRule(inputs) {
    // Build batch request and create lookup map
    var batch = [];
    var idToInput = {};

    for (var i = 0; i < inputs.length; i++) {
        batch.push({ id: inputs[i].index, text: inputs[i].value });
        idToInput[inputs[i].index] = inputs[i];  // Preserve input objects
    }

    const response = await fetch("https://api.example.com/check-batch", {
        method: "POST",
        body: JSON.stringify(batch)
    });

    const results = await response.json();

    // Map results back with original input objects (REQUIRED)
    return results
        .filter(r => r.hasError)
        .map(r => ({
            message: r.error,
            input: idToInput[r.id]  // Include original input object
        }));
}

And add batch: true to your ruleset’s functionOptions.