<aside> đź’ˇ Part of the Dataland Learning Roadmap

</aside>

Dataland allows you to write serverless TypeScript functions that fetch data on a recurring basis, and writes the data to a Dataland table, where users are then given an interactive UI to search, sort, filter, and create views on top of this data.

The Dataland team has written many of these functions into open-source modules here, which are free for you to install. Examples include data sync modules for Airtable, BigQuery, Hubspot, Shopify, and more. See Install a “beta” module via Dataland CLI for more details.

This guide walks you through how to write your own module to sync external data from whatever source with a REST API. We’ll be using the JSON Placeholder comments API (https://jsonplaceholder.typicode.com/comments) here.

The end result will look something like this:

CleanShot 2022-11-30 at 11.27.02.gif

Per the usual, you can see the full code for this example on GitHub.

Prerequisites

Instructions

  1. Run npm install in the root of your git repo to install necessary dependencies.

  2. Create a blank new Typescript file in the src/ folder of your repo titled tableSync_jsonPlaceholder.ts

  3. Add the following lines to import the necessary libraries

    // Import methods from Dataland SDK
    import {
      registerCronHandler,
      getDbClient,
      TableSyncRequest,
    } from "@dataland-io/dataland-sdk";
    
    // Import Apache Arrow library functions, which help transform
    // JSON → Arrow which Dataland uses to store data
    import { tableFromJSON, tableToIPC } from "apache-arrow";
    
  4. Define the function to fetch comments from the JSON placeholder API. Add the following lines to the file under the import statements:

    const getDataFromJSONPlaceholder = async () => {
      const response = await fetch("<https://jsonplaceholder.typicode.com/comments>");
      const records = await response.json();
    
      // Reformat object keys to match Dataland's column naming conventions
      // Dataland columns can only contain lowercase [a-z], [0-9], and underscores
      const records_reformatted = records.map((record: any) => {
        return {
          comment_id: record.id,
          post_id: record.postId,
          name: record.name,
          email: record.email,
          body: record.body,
        };
      });
    
      return records_reformatted;
    };
    
  5. With this function defined, we now need to define a “handler” function that called getDataFromJSONPlaceholder, and writes the results to a Dataland table.

Add the following lines to the file:

```tsx
const handler = async () => {
  // Init Dataland DB client to interact with Dataland
  const db = await getDbClient();

  // Get data from JSONPlaceholder Comments API
  const records = await getDataFromJSONPlaceholder();

  if (records == null) {
    return;
  }

  // Transform arbitrary JSON array of objects to Arrow format (Dataland's data format)
  const table = tableFromJSON(records);
  const batch = tableToIPC(table);

  // Create a TableSyncRequest to sync data to Dataland
  // This creates or updates a table in Dataland
  const tableSyncRequest: TableSyncRequest = {
    tableName: "jsonplaceholder_comments",
    arrowRecordBatches: [batch],
    primaryKeyColumnNames: ["comment_id"],
    // If you want to append columns to an existing table, set this to false
    dropExtraColumns: false,
    // If you want to append rows to an existing table, set this to false
    deleteExtraRows: true,
    transactionAnnotations: {},
    tableAnnotations: {},
    columnAnnotations: {},
  };

  await db.tableSync(tableSyncRequest);
};
```
  1. Finally, we can call this handler on a recurring schedule. To do this, add one final line to the bottom of the file:

    registerCronHandler(handler);