Jan 31, 2014

Information Packets

packet You might have noticed that I didn't go into much detail when describing what an Information Packet has inside of it, and this is because the implementation of them is going to affect the way the components are written and consequentially their reuse potential.

Keep in mind that this is a topic that can be debated forever, but like with any engineering topic we have to settle on something that does the job. The main objective of this post is to recap on what's being done and what can be done. If you have any opinions please share them on the mailing list and I'll update this post.

Plain Text

The earliest approach was plain text, following the idea of UNIX pipes and filters. All programs were designed for dealing with text, all parsers and generators that worked with tabular or regular data. Regular expressions are used as cookie cutters for data extraction and usually spaces are used for tabulating fields. This approach has inherent problems, since programs that do non trivial operations over the received data are structured as parsers, with either brittle implementations for understanding the input data or robust and complex to deal with any kind of input. We don't want that on our FBP components, so we either settle with very simple data for our IPs or find some other mechanism. The advantages of text are clear, it's cross platform, does not need additional serialization, it can be compressed ( with performance advantages over seemingly random data ), it's easier to log and analyze visually, easy to edit ( via text editors), etc. A special kind of packet that signalized EOF is used, but with groups this is no longer needed.

Objects

Now that OOP is a common thing, another approach was devised. What if we send a reference to an object as an IP ? We have clear advantages, less parsing, just some casting on the receiving end. If a String is expected, the receiver casts to String, etc. This looks simple, but we give up on static type checking from compiler tools, and unexpected errors can occur. What if we simply send primitive types as IPs? For example sending an instance of String, or Float, etc. The advantage is that we can make use of static checks now, but we lose a lot of flexibility when it comes to plugging components together. A solution is using casting components in between connections, or a mechanism to "tag" input ports to work as casts. But then again, we are going to have errors when casts are not valid.

IP class

What about creating an IP class that holds the packet data and some other details?

First we need to settle on the type of the data field inside the IP class. We could use the language's primitive types, but we are still facing the problem of high coupling and complicated code reuse. It's also not cross platform. Wrapping a float inside an IP is going to be the same as sending the Float by itself. Programming languages have a solution for this kind of problem, and we see it everyday in the form of literals.

3451.2f
"hello world!"
0x543
200
[ 1, 2, 3, 4]

The compiler parses this text and handles it behind the curtains. We can go back to using text for storing the data, but still know if it passes the parsing for the correct type. Now take a look at this:

somebody@somehost.com
http://flowbased.tk
$54.03
2/3

Inspired by REBOL, we can have more than just the base types your average statically typed language uses. We can have email IPs, money IPs, fractions, URLs, etc.

Said packet types also have different operations that can or cannot be applied to them. For example money can't be added to date, emails can't be concatenated, and so on. I'm experimenting with the implementation of this IP system, but you get the general idea. IP parsers and rules for running operations on them. If we operate on the IP level instead of using the IP simply as a container, component reuse can increase dramatically. A component that calculates the average on a group of packets does not need to care about the types inside of them, if by any chance one of the packets was "hello world!", it will generate an error packet, and so on. Language generics can also help with this task, but i haven't seen it implemented yet.

NoFlo's approach to packets is to use any kind of object but there's also a feature that's being implemented to tag ports with "int", "string", "any", etc to check for network correctness. I suggest using this tagging approach on any kind of FBP system because it enables static checking of the flowgram. I will update this post when the feature is working.

xml ip

There is an interesting approach by xbeans which is using XML as the packet format. At first glance it seems like a waste of processing power but so many ideas get killed every day in the name of performance that we need to give this idea a chance.

Advantages

  • Packet validation with schemas.
  • URI to obtain schemas.
  • Serialization is already done for inter network exchange.
  • Elements can be tagged with metadata.

But there are problems with xml, for example the order of the elements is not preserved in lists unless the order is explicit. It does not mean that an XML file is going to scramble itself into a mess, but a list of elements is not guaranteed to be read in the same order they were written.

We can patch this with explicit ordering either as metadata or encapsulating entries within a cardinal set of tags.

Even if we consider this as an overkill, having an IP descriptor/schema repository via URIs in a similar fashion as a we could have a component repository for code reuse. Even if a few components use a specific type of IP, we could fetch the descriptor and use it on our own components and maintain compatibility. The schemas should be versioned and all that jazz to respect compatibility. Another interesting side effect of this would be fetching all the components that use a certain IP so that we can find the kind of component that we need. There are many interesting things to discuss about repositories of descriptors and components but I'll cover that on another post.

Conclusion:

Each FBP system settles with the simplest approach for the kind of problems it was designed to solve, plain text can do the job but other approaches are being explored all the time.

Return button