Monday, July 30, 2018

Are functions colored?

There is a blog post that I read some weeks ago when I was in a frenzy reading Rust Language RFCs, specifically this one (yes, I'm crazy and read the better part of that thread; the comment that mentioned the article is here).

I want to begin with a little context because my opinions here shifted a lot while thinking about the involved topics.

First: how Rust handles exceptional conditions. In a language such as Python, you could write a function like this:
def divide(divident, divisor):
    if divisor == 0:
        raise RuntimeError("Divide by zero")
    return divident / divisor
There's probably a DivisionByZero error in the standard library, but that's not my point. The important thing is that "raising" (or in other languages "throwing") an exception can be handled by the caller, or not (badness of code again not the point):
def foo():
    try:
        return divide(3, 0)
    except RuntimeError:
        return None

def bar():
    x = divide(3, 0)
    print(x)
foo provides a default value for division by zero, bar propagates the error to its caller, who then has to handle (or again propagate) it.

Rust takes a different approach. Rust doesn't have exceptions, instead it has a Result type that's defined roughly like this:
pub enum Result<T, E> {
    Ok(T),
    Err(E),
}
A result can either be Ok and contain a result of type T, or an Err, and contain an error value of type E. Our function could look like this:
fn divide(divident: u32, divisor: u32) -> Result<u32, ()> {
    if divisor == 0 {
        Ok(divident / divisor)
    } else {
        Err(())
    }
}
Here E is (), which means an Err doesn't carry any additional information. This is how a caller could use this:
fn foo() -> Option<u32> {
    match divide(3, 0) {
        Result::Ok(value) => Some(value),
        Result::Err(_) => None,
    }
}

fn bar() {
    let x = divide(3, 0);
    println!("{}", x.unwrap());
}

fn baz() {
    if let Result::Ok(x) = divide(3, 0) {
        println!("{}", value);
    }
}
Here the discrepancies between Python and Rust are very apparent:
  • foo returns Option<u32>, not just u32. That means the caller again has to unpack that value if they need a u32. The Python code would probably lead to an exception later, if None can't be processed somewhere.
  • bar uses a second error handling mode that is kind of similar to exceptions, but is not the first choice in Rust: it panics. unwrap() returns the value of a result, but if the result was an error, the panic (usually) kills the thread or process. Panics can be caught, but that's meant for infrastructure code such as thread pools, not for everyday application error handling.
  • baz swallows the exception: the if let statement matches only a successful result, so if there's no match, nothing happens.
There is an additional way of handling errors that I'll need again later:
fn quuz() -> Result<(), ()> {
    let x = divide(3, 0)?;
    println!("{}", x);
    Ok(())
}
Notice the question mark. This operator - I'll call it the try operator - returns the result immediately if it was an error, or takes its content if it was successful. This early-returning behavior feels a little like raising an exception, but it requires that the function itself returns a result - otherwise it couldn't return an Err! A side effect is that, even though we don't have a real result, we need to end the function with Ok(()), i.e. return a successful result without any data.

Now back to code colors. This is the point where you should read the linked blog post, if you haven't already. It proposes a language where
  • every function is either blue or red
  • those colors are called with different syntax
  • only red functions can call other red functions
  • red functions are "annoying"
Those red functions are async functions that need async/await syntax to call. For the rest, read the linked post. An important point here is transparency: callers have to know function color; higher-order functions may work on all blue functions, but a separate function for red functions (and thus code duplication) might be necessary.

At first the argument that async poses a needless separation into two colors seemed intriguing, although I wasn't entirely convinced. But that doubt felt lazy, caused by habit rather than reason. I left it at that, followed some more discussions about possible syntactic options for async/await in Rust, and wrote some actual async Rust code.

Rust tries to get things right. The people behind Rust experiment with features a lot before they're stabilized, and they wouldn't just add an await keyword because that's what other languages do; Rust's syntax should not only be familiar, it should be practical and reduce the possibility for programmer errors.

A concern with await is the combination with the question mark. Async operations are often ones that could fail, so you would await them and then check for an error - like this:
(await do_something())?;  // or
await do_something()?;
The first is ugly, but would you expect the second to await first, try second? That's why people were discussing a postfix variant:
do_something().await?;  // or
do_something().await!()?;  // actually proposed postfix macro syntax, or even
do_something()~?;  // I made that up, but in principle, why not?
This reads properly from left to right. And, now I'm getting to the point, it shows that the blue/red distinction is not really unique to the topic of async functions. Error handling, at least the explicit kind using Result, is basically the same:
  • every function is either blue or red - either you return T or Result<T>
  • those colors are called with different syntax - you need to extract Results, but not Ts
  • only red functions can call other red functions - not entirely. You can do one of three things:
    • make your function red and use the try operator (quux);
    • deal with the implications and keep your function blue (foo, bar, baz);
    • write buggy code (bar, baz depending on requirements)
  • red functions are "annoying" - Results are more annoying than Ts
Does that mean that Results are a bad way of handling errors? I don't think so. How do exceptions hold up in this comparison?
  • every function is either blue or red - if you throw exceptions, that's a difference in contract; I take it as a yes
  • those colors are called with different syntax - that's the plus, your higher-order functions don't see any color
  • only red functions can call other red functions - nope, but I think that was already debatable
  • red functions are "annoying" - not to call, but to call correctly
In fact, if we look at checked exceptions as they exist in Java, we get basically the same picture as with async/await or results: you can't put a function that may throw a checked exception somewhere no checked exceptions are expected.

So Results and (unchecked) exceptions are both valid mechanisms on a spectrum of explicit vs. implicit error handling. await is on the explicit side of the asynchronicity spectrum; it remains to be seen how implicit asynchronicity can be implemented properly (Go might qualify as an already working solution here). It may not be a good fit for Rust, which chooses the explicit path, but dynamic languages might benefit from a proven solution to this.

Sunday, July 22, 2018

Type-level states in Rust

This is a pattern I saw recently while reading some hardware abstraction code for microcontroller programming with rust. On microcontrollers, you can access pins of your chip and use them for different kinds of work. For example, one pin could be used for input, another for output, an yet another pin could be configured for "alternate functions". Also, an input pin could have a pullup or pulldown resistor, or none ("floating").

So there's a hierarchy of configurations a pin could use:
  • Input
    • Floating (default)
    • PullUp
    • PullDown
  • Output
    • PushPull
    • OpenDrain
  • AF0 (alternate function 0)
  • AF1
  • ...
This state is stored in the microcontroller hardware, but this pattern could also be useful when the state is stored elsewhere, like connection state that's stored in the network card's driver, or state on a server.

A classical way to model this is using an enum, storing the state somewhere, and checking the state at runtime (or just trusting the programmer):
#[derive(PartialEq, Clone, Copy)]
pub enum Mode {
    InputFloating,
    InputPullUp,
    Output,
    // more features left out for brevity
}

pub struct Pin {
    mode: Mode, // just for checking
}

impl Pin {
    pub fn new() -> Self {
        Pin { mode: Mode::InputFloating, /* just for checking */ }
    }

    pub fn configure(&mut self, mode: Mode) {
        self.mode = mode;  // just for checking
        // configure the hardware
        match mode {
            Mode::InputFloating => {}
            Mode::InputPullUp => {}
            Mode::Output => {}
        }
    }

    pub fn read(&self) -> bool {
        assert!(self.mode != Mode::Output); // just for checking
        false // bit is read from hardware
    }

    pub fn write(&self, value: bool) {
        assert!(self.mode == Mode::Output); // just for checking
        // bit is written to hardware
    }
}
There are a couple of downsides with this approach:
  • If we want checks, the value is stored twice: once in the struct and once in a register of the microcontroller
  • If we want checks, we also slow down the actual program.
  • If we trust the programmer instead, we can remove all the "just for checking" lines/items, but now we're trusting the programmer.
  • Even with checks, we only see errors at runtime, and there could be lots of code paths that lead to writing or reading a pin.
Luckily, we can do better!
pub struct InputFloatingPin(());
pub struct InputPullUpPin(());
pub struct OutputPin(());

impl InputFloatingPin {
    pub fn new() -> Self {
        InputFloatingPin(())
    }

    pub fn into_input_pull_up(self) -> InputPullUpPin {
        // configure the hardware
        InputPullUpPin(())
    }

    pub fn into_output(self) -> OutputPin {
        // configure the hardware
        OutputPin(())
    }

    pub fn read(&self) -> bool {
        false // bit is read from hardware
    }
}

impl InputPullUpPin {
    // pub fn into_...

    pub fn read(&self) -> bool {
        false // bit is read from hardware
    }
}

impl OutputPin {
    // pub fn into_...

    pub fn write(&self, value: bool) {
        // bit is written to hardware
    }
}
Here, every configuration has its own struct type, and we use into_... methods to configure the pin and convert between the associated struct type. The struct definitions with (()) at the end look a little funny, that's for visibility reasons: the () is a 0-byte-sized private field that prevents the struct from being constructed outside the declaring module. Clients have to go through InputFloatingPin::new(), meaning they have to start with the reset configuration.

So we have a bunch of structs with no values in them, which seems kind of stupid. But as noted above, the configuration is in hardware anyway. Also, creating structs that don't need memory is not only cheap, it has no cost at all! We get all the checks from above, but at compile time without slowing the program. We can even use these types to specify that a function requires a pin in a certain configuration.

There is a downside however: code duplication. There's the into_... methods I left out, and there's the read method that's written twice.

So let's fix that with generics. The idea is to define a struct Pin<MODE>; all the into_... methods would be defined on Pin<_>, thus available in any configuration. Pin<Output> is trivial, but crucially, we model the proper configuration hierarchy for inputs: read is implemented for Pin<Input<_>>! So no matter what kind of input we have, we can read from it:
use std::marker::PhantomData;

pub struct Pin<MODE>(PhantomData<MODE>);

pub struct Input<MODE>(PhantomData<MODE>);

pub struct Floating(());
pub struct PullUp(());

pub struct Output(());

impl Pin<Input<Floating>> {
    pub fn new() -> Self {
        Pin(PhantomData)
    }
}

impl<MODE> Pin<MODE> {
    pub fn into_input_floating(self) -> Pin<Input<Floating>> {
        // configure the hardware
        Pin(PhantomData)
    }

    pub fn into_input_pull_up(self) -> Pin<Input<PullUp>> {
        // configure the hardware
        Pin(PhantomData)
    }

    pub fn into_output(self) -> Pin<Output> {
        // configure the hardware
        Pin(PhantomData)
    }
}

impl<MODE> Pin<Input<MODE>> {
    pub fn read(&self) -> bool {
        false // bit is read from hardware
    }
}

impl Pin<Output> {
    pub fn write(&self, value: bool) {
        // bit is written to hardware
    }
}

Wednesday, July 18, 2018

HTTP and Websockets

A quick epilogue to the service/topic story: what approach is used in HTTP (and many other Internet protocols)? Of course, it's a classical request/reply architecture, and that is too bad!

HTTP has become one of the most prevalent protocols, but it does not provide a way to initiate communication from the server. Back when browsers displayed static web pages, that was fine, but now the content that is served to browsers is highly dynamic. And browsers, for security reasons, can't just use any protocol. For a long time, HTTP was the only option.

An initial "fix" to this was HTTP long polling: the client would issue an HTTP request, but instead of responding immediately the server would wait for some event to occur. Once the client got a response, he would immediately make another request, which would then block again. That's not what HTTP was supposed to do, but it works.

Fortunately, we recently got WebSockets, which handle bidirectional communication in a clean manner. Now, we basically have this:
Browser   Webserver
   ---- http ---->    Browser sends requests, server replies
   <- websocket ->    Browser and server can both send messages at any time
And ta-da! Universality was restored!

Sunday, July 15, 2018

The ROS Service/Topic abstractions

Although I quickly outlined the two messaging facilities of ROS (topics and services), I then focused on possible underlying I/O models, and said that a polling mechanism makes for a good primitive.

I think it's clear that without the ability to poll multiple event sources, it's hard to implement efficient asynchronous IO. On the other hand, the opposite direction is really easy. Let's take the code from last time:
rospy.poll(sub1, sub2)
data = sub1.receive()
if data is not None:
    rospy.loginfo(rospy.get_caller_id() + "I heard %s", data.data)
data = sub2.receive()
if data is not None:
    rospy.loginfo(rospy.get_caller_id() + "I heard %s", data.data)
To perform a blocking receive, all you have to do is write this:
rospy.poll(sub)
data = sub.receive()
So supporting polling does not push one into nonblocking IO, but not supporting it does push one into blocking IO.

Now let's look a level higher at the actual messaging. ROS provides us with two tools here:
  • topics allow multiple subscribers to receive all messages from multiple publishers;
  • services allow multiple clients to send requests and receive replies from a single server.
You will notice that both types of connections are asymmetric: topic subscribers and servers can't initiate communication (subscribers can't send messages at all). So this pushes us into a specific approach. These can't support arbitrary protocols, right?

Not on their own, but we could use a pair of topics or a pair of services to model a full-duplex connection:
Node 1          Node 2
   --- topic_12 -->    Node 1 publishes to node 2
   <-- topic_21 ---    Node 2 publishes to node 1
or
Node 1          Node 2
   -- service_12 ->    Node 1 is client of node 2
   <- service_21 --    Node 2 is client of node 1
In the first case, you basically send independent messages, in the second case each message gets a response as well. I also want to show a third topology:
Node 1          Node 2
   -- service_12 ->    Node 1 is client of node 2
   <-- topic_21 ---    Node 2 publishes to node 1
This is also full-duplex, and way better aligned to real-world needs. This approach comes rather naturally when designing a system:
  • I (Node 1) want to give my robot (Node 2) commands, for this I need a service.
  • I also want to know my robot's state, for this I need a topic.
  • (and this approach, unlike the others, is easily extended to multiple clients)
That's the value behind the topic/service abstraction: although it seems to constrain you, it actually helps with the patterns that appear in the real world. Another aspect: topics and services are constrained to a single message type (or request/reply message type pair). How do you work with that? Easy, use separate services and topics for different kinds of data. What seems like a constraint at first is actually just pushing you into separation of concerns!

Sunday, July 8, 2018

Approaches to IO

I'm currently looking at ROS which, despite its name (Robot Operating System), is neither an operating system nor at its core particularly specific to robotics. ROS includes libraries that help you model and program robots, but at its core, it is a framework for distributed applications.

A quick overview: a ROS application consists of several nodes (processes) that communicate via topics (broadcast) and services (remote procedure calls). Using a message definition language, you can define which data is sent between nodes. There's also a master node which provides name resolution, so that clients can find the topics and services they're interested in. Libraries can provide you with message definitions and utilities, and you will usually use third-party nodes in your application. I would say these nodes still count as libraries, but unlike a traditional library you don't call their code in the context of your process. Instead the node runs as its own process and you use topics and services for communication.

This may sound familiar because it's basically the same as a microservice architecture or probably dozens of other approaches to distributed applications. What follows is applicable to a lot of different systems, but I'll use ROS to present my code samples.

Topics, publisher:
import rospy
from std_msgs.msg import String

pub = rospy.Publisher('chatter', String, queue_size=10)
rospy.init_node('talker', anonymous=True)
rate = rospy.Rate(10) # 10hz
while not rospy.is_shutdown():
    hello_str = "hello world %s" % rospy.get_time()
    rospy.loginfo(hello_str)
    pub.publish(hello_str)
    rate.sleep()
Topics, subscriber:
import rospy
from std_msgs.msg import String

def callback(data):
    rospy.loginfo(rospy.get_caller_id() + "I heard %s", data.data)

rospy.init_node('listener', anonymous=True)
rospy.Subscriber("chatter", String, callback)

# spin() simply keeps python from exiting until this node is stopped
rospy.spin()
The code is simplified and taken from the ROS Wiki. It should be pretty easy to understand: the publisher sends a message to the chatter topic ten times a second, and the subscriber registers a callback to handle these messages. There can be any number of publishers and subscribers on the same topic.

A different approach to the same problem might be a more traditional blocking I/O API:
import rospy
from std_msgs.msg import String

rospy.init_node('listener', anonymous=True)
sub = rospy.Subscriber("chatter", String, callback)

while not rospy.is_shutdown():
    data = sub.receive()
    rospy.loginfo(rospy.get_caller_id() + "I heard %s", data.data)
This code is of course only hypothetical. The idea is that receive() would block until a new message arrived. What are the up- and downsides? One upside is that it is perfectly clear how the message is handled: right in the main thread. The original code only had a callback, and we had basically no idea where that is called. Does each subscription spawn a new thread? Are the callbacks called inside spin()? If so, what happens when we replace spin() by a busy loop? (for the record, my tests suggest one thread per subscription, but maybe ROS is smarter when the number of subscriptions rises. Or maybe not - another downside of the original approach?)

But of course there are upsides to the original code: for one, it hides the logic of receiving messages from the user. What you're interested in is the message payload, not the control flow you have to use to get it! And the most serious consideration: what if you have two subscriptions? You can't simply call receive for both, because it blocks. When subscription A doesn't get new messages, it won't let subscription B receive its messages, no matter how many there are.

This is a classic problem in blocking I/O, and the naive solution is multithreading: just block a dedicated thread that doesn't have to process anything else. For robotics, this is probably fine, but spawning and switching between operating system threads is relatively expensive and so not that appropriate for servers that operate under high load.

One step in the right direction would be a nonblocking receive:
import rospy
from std_msgs.msg import String

rospy.init_node('listener', anonymous=True)
sub1 = rospy.Subscriber("chatter", String, callback)
sub2 = rospy.Subscriber("gossip", String, callback)

while not rospy.is_shutdown():
    data = sub1.receive()
    if data is not None:
        rospy.loginfo(rospy.get_caller_id() + "I heard %s", data.data)
    data = sub2.receive()
    if data is not None:
        rospy.loginfo(rospy.get_caller_id() + "I heard %s", data.data)
Here, instead of blocking, receive will immediately return None if there is no message yet. If you haven't spotted it, the downside is that this is a busy loop: no matter whether there are messages or not, the CPU will constantly iterate through this code. In the blocking code, the operating system would just put the thread aside until the socket that receive is waiting for has new bytes. Now this is not the case and the OS will run the thread, even though it's actually blocked.

What we really want is some kind of multi-source polling. Let me show just one possible variant:
import rospy
from std_msgs.msg import String

rospy.init_node('listener', anonymous=True)
sub1 = rospy.Subscriber("chatter", String, callback)
sub2 = rospy.Subscriber("gossip", String, callback)

while not rospy.is_shutdown():
    rospy.poll(sub1, sub2)
    data = sub1.receive()
    if data is not None:
        rospy.loginfo(rospy.get_caller_id() + "I heard %s", data.data)
    data = sub2.receive()
    if data is not None:
        rospy.loginfo(rospy.get_caller_id() + "I heard %s", data.data)
here we still assume receive is nonblocking, but before running the loop, we use poll. poll will block until data on either subscription is available. Afterwards, we still need to check the subscriptions ourselves, but performance-wise this works.

It turns out that this - at least the principle - is a rather useful primitive for the operating system to provide. It allows to build many different abstractions efficiently. receive is not enough, but poll is.

Sunday, July 1, 2018

Software tooling

Learning a new programming language is hard. For the obvious reasons, but also because of tooling. (This post is mostly going to be a rant, ignore it if you're looking for practical information.)

I think nothing is more frustrating when learning a new language than not being sure where errors come from. Do you need to install another library that you're building against? Do you need to add paths to your environment? Are you passing incorrect flags to the compiler? Is your project definition missing some configuration?

And it goes on: What IDEs have good integration? Is the language or its community opinionated on project layout, documentation, testing? If not, are there still any best practices to follow? All this makes it hard to effectively use a new language at first, even if the language's syntax and semantics are familiar from the start. But all this should be trivial to take care of.

When I started programming (with Java), it was (as far as I could tell) the time of Ant build scripts. In its functionality, Ant is similar to Make, in that it defines rules for building specific parts of a project, and dependencies between rules to deduce which rules should fire and in which order. Unlike Make, Ant is Java based and its build files are XML (thus making Ant ugly to read or write).

When I heard about Maven, it was a huge revelation. Not because it made expressing build processes so much easier (or because it was less ugly - Maven still used XML build files), but because unlike Ant, Maven included what I consider the most important part of language infrastructure: dependency management! Tell Maven what libraries your project depends on, and it will download them and configure all paths so they're found. Nice!

So there's dependency management, there's building your project, and also importantly, there's installing the tools in the first place. I usually don't want to pollute my operating system by installing stuff when I have no idea what I'm actually doing.

Let's leave Java behind and look at Python. One major problem is the split between Python 2 and 3. They are largely incompatible and Python 3 has lots of cool stuff, but some software still requires Python 2. On the plus side, there are pyenv and virtualenv which let you install Python without root and Python libraries for specific projects, respectively.

Normally, your Python package will have a file setup.py which contains your project's description: name, version, dependencies, etc. As this is a Python file, you could think this is meant to contain build instruction, but that's not really the case. The file should contain a single call to setup(...), which uses the data provided in its parameters in a fairly declarative way. Further, in "modern" usage, you will likely use pip to handle dependencies. For example, pip install -e ./myproject will find ./myproject/setup.py and install the package (as editable, -e) as well as all missing dependencies.

So pip handles dependencies. What about building? Well, Python is interpreted, so normally not much. To make a "distribution" of your package, you can call python setup.py sdist and/or python setup.py bdist_wheel to make a source or binary distribution.

In JavaScript, the situation is more confusing. There are tools like bower, grunt, gulp, browserify, webpack, npm, yarn, etc. that all enter the picture, and tomorrow there will be even more. Also, there's three major ways to execute JavaScript: in a Browser, in Node and in Internet Explorer. But despite that, if you specify a package.json and use e.g. npm, you have a pretty good chance of distributing your packages to your users. (Does it show that I have not used JS as much as Python?^^)

My last look will be at Rust, because it's cool and is in a rather special situation: Rust is new, and its build tool, cargo, was developed from the beginning with the language and the rest of the tooling. It handles dependencies, building code and documentation, testing and publishing packages all in one. For installing Rust, there is the rustup tool, which fits approximately pyenv's profile: it lets you install and choose different versions of Rust. This is especially important as Rust has a six-week release cycle and a nightly channel that allows to use new unstable features.

Cargo is not required to build Rust, you can use the compiler rustc on its own as well, but usually there's no reason to go that way. That means that Rust is a rather homogenous environment (as long as you're not using unstable features) and you can focus on getting your code to compile at all; the rest of your environment will most likely work out of the box.

Sunday, June 24, 2018

Interoperability

Looking back at my earlier posts, I'm surprised how utterly Java-centric everything I did was. Never mind things like bytecode instrumentation, of course that's JVM specific, and it's cool stuff!

But for other things, I think I would try to achieve more interoperability these days. Specifically, I'm looking at JGroups. It has some nice features in terms of consensus, making sure that all known nodes have the same view of the system, but looking at it purely as a networking library, it shockingly limits you to implement all your nodes in a JVM language.

I think there are generally three layers of interoperability. The JGroups example is one extreme, basically no interoperability. It has one Java implementation and its network protocols are specific to that implementation. I don't know the specifics of JGroups, but the protocol used may well be tailored to transmit very specific manipulations of JGroups nodes' states. In other word, there are lots of assumptions that make JGroups easy to implement in Java, but very, very hard to port to another language.

One kind of interoperability comes through dynamic linking. For example, through native methods, Java can call code written in C or another language that is compiled into machine code. As long as the JVM uses the ABI and calling conventions of the native library, access is no problem. The native code is executed in the calling process, so this kind of interoperability is strictly for libraries, not for client/server communication.

So if it was CGroups instead - written in C - then there could simply be a Java binding and a .Net binding etc., and all platforms using these bindings could cooperate. Unfortunately the reverse isn't true: the JVM is, quite obviously, a virtual machine, and you can't simply call code targeted at a VM. That VM first has to run, and then you have to interface with the VM before finally getting access to the library. This is true for all languages requiring a runtime environment (i.e. interpreted languages), but it gets worse the more complicated the runtime gets.

Which leads me to the third kind of interoperability: wire protocols, which solve the issue of having a protocol that is full of assumptions from the original implementation. Instead, the first step to designing the application or library is to design a binary or text based encoding and to clearly states what kinds of communication are allowed, in what order, at what time, etc. The protocol has to be simple and explicit enough to write down, so it should also be possible to implement in different languages. That doesn't allow you to use one library in different contexts, but it makes it feasible to port the library for different platforms.

If it weren't for that last kind of interoperability, the Internet would be a very different place. Imagine if websites used a browser-specific protocol instead of HTTP! Microservice architectures also rely on well-defined protocols to connect otherwise incompatible systems: of course, a service written in Java can expose some JGroups-derived functionality via, say, HTTP, and then any other application can access that service.