Wednesday, October 7, 2020

Elegant R Runtime Solution for Multithreading


A coworker came up with an elegant solution to an issue we’ve been facing with the R programing language. The R programming language is a popular open source environment for statistical computing. We run multiple R algorithms on a server using the Plumber framework which converts existing R code into web APIs by adding a couple of special comments. A key issue with R is that it’s single threaded (Microsoft’s solution is R Open). A single threaded language means that only a single block of code, inside an application, can execute at any given time. In other words, while that single block is executing all other code is blocked. This isn’t typically an issue unless there’s a block of code that takes a long time such as a complex algorithm or a call over a network; and our application, which supports multiple users, concurrently, does both so being single threaded won't scale for us.

One solution I’ve been pursuing over the last year was to use the AWS Lambda service which allows standalone functions to execute in the cloud. The challenge with Lambda is it doesn’t natively run R code; it’s better suited for Java, Python, and other languages. While it’s technically feasible to run R inside of Lambda, we hadn’t yet implemented it because it’s non-trivial.


The solution my coworker came up with was to run a Node.js server that simply makes calls to shell scripts (R scripts) running on the host operating system. Each shell script runs in its own process so one won’t block another. And, even though Node.js is, itself, single threaded, it uses non-blocking input/output calls which allows it to support tens of thousands of concurrent connections without incurring the cost of thread context switching.

Simple, elegant, and trivial to implement.

No comments: