2068 Words

One of the hardest concepts to wrap our heads around when we first learn JavaScript is the asynchronous processing model of the language. For the majority of us, learning asynchronous programming looks pretty much like this

If your first time working with async wasn’t like this, please consider yourself a genius

If your first time working with async wasn’t like this, please consider yourself a genius

As much as hard as it is to pick up, async programming is critical to whoever want to use JavaScript/Node.js to build web applications and servers as JS codes are asynchronous by default.

The fundamentals

So what exactly is asynchronous processing model, or non-blocking I/O model which we must all have heard of as Node.js users? A tl;dr description for it would be that: In an async processing model, our application engine when interacting with external parties (e.g. file system, network), instead of waiting til it’s got a result from the external parties, continues with to subsequent tasks and only come back to the prior external parties once it’s got a signal of a result.

To understand the default async processing model of Node.js, let’s have a look at a hypothetical Santa’s workshop. Before any work can begin, Santa will have to read each of the lovely letters from kids around the world.

Santa reading letter for workshop

He will then deduce the requested gift, translate the item name into the Elvish language, and then pass the instruction to each of our hard working elves who have different specialisations: wooden toys for Red, stuffed toys for Blue, and robotic toys for Green.

Santa passing instruction to Red

This year, due to the COVID-19 pandemic, only half Santa’s elves can come to his workshop to help. Nonetheless, as wise as he is, Santa decides that, instead of waiting for each elf to finish preparing a gift (i.e. synchronously), he will continue translating and passing out instructions from his pile of letters.

Santa passing instruction to Blue

So on and so for…

Santa continue passing out instructions

As he is just about to read another letter, Red informs Santa that he has completed preparing the first gift. Santa then receives the present from Red, put it to one side…

Santa receiving Red’s present

…before continuing with translating and passing instructions from the next letter.

Santa passing instruction to Green

As he only needs to wrap a pre-made flying robot, Green can quickly finish preparation and pass the present to Santa.

Santa receiving Green’s present

After a whole day of hard and asynchronous work, Santa and the elves manage to complete all present preparation. With his improved asynchronous model of working, Santa workshop is completed in record time despite being hard-hitted by the pandemic.

Santa’s gotten all the presents

So that’s the basic idea of asynchronous or non-blocking I/O processing model. How is this done in Node.js specifically?

The Node.js event loop

Most of us would probably have heard before that “Node.js is single-threaded”. However, to be exact, only the event loop in Node.js, which interacts with a pool of background C++ worker threads, is single-threaded. There’re 4 important components to the Node.js processing model:

  • Event Queue: Tasks that are declared in a program, or returned from the processing thread pool via callbacks (The equivalent of this in our Santa’s workshop is the pile of letters for Santa)
  • Event Loop: The main Node.js thread that facilitate event queue and worker thread pool to carry out operations - both async and synchronous. (This is Santa 🎅)
  • Background thread pool: These theads do the actual processing of tasks, which might be I/O blocking (e.g. calling and waiting for response from an external API). (These are the hardworking elves 🧝🧝‍♀️🧝‍♂️ from our workshop)

This processing model can be visualised as below:

Diagram courtesy of c-sharpcorner.com

Diagram courtesy of c-sharpcorner.com

Let’s look at an actual snippet of code to see these above in action

console.log("Hello");
https.get("https://httpstat.us/200", (res) => {
  console.log(`API returned status: ${res.statusCode}`);
});
console.log("from the other side");

If we execute the above piece of code, we would get this in our standard output

Hello
from the other side
API returned status: 200

So how does the Node.js engine carries out the above snippet of code? It starts with 3 functions in the call stack.

Processing starts with 3 functions in the call stack

“Hello” is then printed to the console with the corresponding function call removed from the stack.

Hello console log removed from stack

The function call to https.get (i.e. making a get request to the corresponding URL) is then executed and delegated to the worker thread pool with a callback attached.

https.get delegated to worker pool

The next function call to console.log get executed and “from the other side” is printed to the console.

Next console.log get executed

Now that the network call has return a response, the callback function call will then get queued inside the callback queue. Note that this step could, though normally is not the case, happen before the immediate previous step (i.e. “from the other side” getting printed).

Network call completes and callback queued

The callback then get put inside our call stack

Callback put inside call stack

and then we will see “API returned status: 200” in our console

Status code printed out

By facilitating the callback queue and call stack, the event loop in Node.js efficiently execute our JavaScript code in an asynchronous manner.

A synchronous history of JavaScript & Node.js async/await

Now that we have grasped a good understanding into asynchronous execution and the innerworkings of Node.js event loop, let’s dive into the async/await implementations of JavaScript through time, from the origin callback-driven implementation to the latest shiny async/await keywords.

Callbacks

The OG way of handling asynchronous nature of JavaScript engines are through callbacks. Callbacks are basically functions which will be executed, usually, at the end of synchronous or I/O blocking operations. A straightforward example of this pattern is the built-in setTimeout function that will wait for a certain number of milliseconds before executing the callback.

setTimeout(2000, () => {
  console.log("Hello");
});

While it’s convenient to just attach callbacks to blocking operations, this pattern also introduces a couple of problems:

  • Callback hell
  • Inversion of control (not the good kind!!)

Callback hell

Let’s have an example with our Santa and his elves again. To prepare a present, Santa’s workshop would have to carry out a few different steps (with each take different durations simulated using setTimeout)

function translateLetter(letter, callback) {
  return setTimeout(2000, () => {
    callback(letter.split("").reverse().join(""));
  });
}
function assembleToy(instruction, callback) {
  return setTimeout(3000, () => {
    const toy = instruction.split("").reverse().join("");
    if (toy.includes("wooden")) {
      return callback(`polished ${toy}`);
    } else if (toy.includes("stuffed")) {
      return callback(`colorful ${toy}`);
    } else if (toy.includes("robotic")) {
      return callback(`flying ${toy}`);
    }
    callback(toy);
  });
}
function wrapPresent(toy, callback) {
  return setTimeout(1000, () => {
    callback(`wrapped ${toy}`);
  });
}

These steps need to be carried out in a specific order

translateLetter("wooden truck", (instruction) => {
  assembleToy(instruction, (toy) => {
    wrapPresent(toy, console.log);
  });
});
// This will produced a "wrapped polished wooden truck" as the final result

As we do things this way, adding more steps to the process would mean pushing the inner callbacks rightward and ending up with a callback hell like this.

Callback Hell

Callbacks look very sequential but at times the execution order doesn’t follow what shown on our screen. With multiple layers of nested callbacks, we will lose track of the big picture of the whole program flow and produce more bugs or just become slower when writing our code.

So how do we solve this problem? Simply modularise the nested callbacks into named functions and we will have a nicely left-aligned program that’s easy to read.

function assembleCb(toy) {
  wrapPresent(toy, console.log);
}
function translateCb(instruction) {
  assembleToy(instruction, assembleCb);
}
translateLetter("wooden truck", translateCb);

Inversion of Control

Another problem with the callback pattern is that we don’t decide how the higher-order functions would execute our callbacks. They might execute it at the end of the function, which is conventional, but they could also execute it at the start of the function or execute it multiple times. Basically, we are at the mercy of our dependency owners, we might never know when they will break our code.

To solve this problem, as a dependency user, there’s not much we can do about it. However, if we’re ever in the seat of a dependency owner, please always:

  • Stick to the conventional callback signature with error as the first argument
  • Execute callback only once at the end of your higher-order function
  • Document anything out-of-convention that are absolutely required and always aim for backward compatibility

Promises

Promises were created to solve these above mentioned problems with callbacks. Promises enforce JavaScript users to:

  • Stick to a specific convention with their signature resolve and reject functions.
  • Chain the callback functions to a well-aligned and top-down flow.

Our previous example with Santa’s workshop preparing presents can be rewritten with promises like so

function translateLetter(letter) {
  return new Promise((resolve, reject) => {
    setTimeout(2000, () => {
      resolve(letter.split("").reverse().join(""));
    });
  });
}
function assembleToy(instruction) {
  return new Promise((resolve, reject) => {
    setTimeout(3000, () => {
      const toy = instruction.split("").reverse().join("");
      if (toy.includes("wooden")) {
        return resolve(`polished ${toy}`);
      } else if (toy.includes("stuffed")) {
        return resolve(`colorful ${toy}`);
      } else if (toy.includes("robotic")) {
        return resolve(`flying ${toy}`);
      }
      resolve(toy);
    });
  });
}
function wrapPresent(toy) {
  return new Promise((resolve, reject) => {
    setTimeout(1000, () => {
      resolve(`wrapped ${toy}`);
    });
  });
}

with the steps being carried out nicely in a chain

translateLetter("wooden truck")
  .then((instruction) => {
    return assembleToy(instruction);
  })
  .then((toy) => {
    return wrapPresent(toy);
  })
  .then(console.log);
// This would produce the exact same present: wrapped polished wooden truck

However, promises are not without problems either. Data in each eye of our chain have a different scope and only have access data passed from the immediate previous step or parent scope. For example, our gift-wrapping step might want to use data from the translation step

function wrapPresent(toy, instruction) {
  return Promise((resolve, reject) => {
    setTimeout(1000, () => {
      resolve(`wrapped ${toy} with instruction: "${instruction}`);
    });
  });
}

This is rather a classic “memory sharing” problem with threading. To solve this, instead of using variables in the parents scope, we should make use of Promise.all and “share data by communicating, rather than communicate by sharing data”.

translateLetter("wooden truck")
  .then((instruction) => {
    return Promise.all([assembleToy(instruction), instruction]);
  })
  .then((toy, instruction) => {
    return wrapPresent(toy, instruction);
  })
  .then(console.log);
// This would produce the present: wrapped polished wooden truck with instruction: "kcurt nedoow"

Async/Await

Last but definitely not least, the shiniest kid around the block - async/await - is very easy to use but also carries some risk of misuse.

Async/await solves the memory sharing problems of promises by having everything under the same scope. Our previous example can be rewritten easily like so

(async function main() {
  const instruction = await translateLetter("wooden truck");
  const toy = await assembleToy(instruction);
  const present = await wrapPresent(toy, instruction);
  console.log(present);
})();
// This would produce the present: wrapped polished wooden truck with instruction: "kcurt nedoow"

However, as much as it’s easy to write asynchronous code with async/await, it’s easy to make mistakes that create performance loopholes. Let’s now localise our example Santa’s workshop scenario to wrapping presents and loading them on the sleigh.

function wrapPresent(toy) {
  return Promise((resolve, reject) => {
    setTimeout(5000 * Math.random(), () => {
      resolve(`wrapped ${toy}`);
    });
  });
}
function loadPresents(presents) {
  return Promise((resolve, reject) => {
    setTimeout(5000, () => {
      let itemList = "";
      for (let i = 0; i < presents.length; i++) {
        itemList += `${i}. ${presents[i]}\n`;
      }
    });
  });
}

A common mistake that we would make is carrying out the steps this way

(async function main() {
  const presents = [];
  presents.push(await wrapPresent("wooden truck"));
  presents.push(await wrapPresent("flying robot"));
  presents.push(await wrapPresent("stuffed elephant"));
  const itemList = await loadPresents(presents);
  console.log(itemList);
})();

But does Santa needs to await for each of the presents to be wrapped one by one before loading? Definitely not! The presents should be wrapped concurrently. We often make the mistake as it’s so easy to write await without thinking about the blocking nature of the keyword.

To solve this problem, we should bundle the gift wrapping steps together and execute them all at once

(async function main() {
  const presents = await Promise.all([
    wrapPresent("wooden truck"),
    wrapPresent("flying robot"),
    wrapPresent("stuffed elephant"),
  ]);
  const itemList = await loadPresents(presents);
  console.log(itemList);
})();

Some recommended steps to tackle concurrency performance issue in our Node.js code are:

  • Identify hotspots with multiple consecutive await in our code
  • Check if they have dependencies on each other (i.e. one function uses data returned from another)
  • Make independent function calls concurrent with Promise.all

Wrapping up (the article, not Christmas presents 😂)

Congratulations on reaching the end of this article, I tried my best to make this post shorter but the async topic in JavaScript is just so broad. Some tl;dr key takeaways:

  • Modularise our JavaScript callbacks to avoid callback hell
  • Stick to the convention for JS callbacks
  • Sharing data by communicating through Promise.all when using promises
  • Be careful about the performance implication of async/await code
  • We ❤️ JavaScript