The Problem
The R programming language is a popular open source environment for statistical computing. CANA runs multiple R algorithms on a server using the Plumber framework, which converts existing R code into web APIs by adding a couple of special comments. A key issue with R is that it is single threaded, meaning that only a single block of code, inside an application, can execute at any given time. In other words, while that single block is executing, all other code is blocked. This is not typically an issue unless there is a block of code that takes a long time, such as a complex algorithm or a call over a network. CANA’s application performs both, and supports multiple users concurrently, so being single threaded does not scale for us.
One solution I pursued over the last year was to use the AWS Lambda service which allows standalone functions to execute in the cloud. The challenge with Lambda is it does not natively run R code and is better suited for Java, Python, and other languages. While it is technically feasible to run R inside of Lambda, we had not yet implemented it because it is non-trivial, requiring that R run inside a Python container with supporting frameworks and libraries.
R Multithreading Solution
A coworker came up with an elegant solution to this programming problem. The solution is to run a Node.js server that simply makes calls to shell scripts (R scripts) running on the host operating system. Each shell script runs in its own process, so one won’t block another. And, even though Node.js is, itself, single threaded, it uses non-blocking input/output calls (“callbacks”), allowing it to support tens of thousands of concurrent connections without incurring the cost of thread context switching.
Simple, elegant, and trivial to implement.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.