For Geeks: Progress on Mesh4x: Cloud Services, Architecture, Adapters, and Adopters

posted on: December 13, 2008

As the year wraps to an end we have a mixed blessing: On one side we have a small but growing portfolio of technology stemming from our organization’s immediate goals to improve disease detection and public health in South East Asia, being built at a steady pace by our small but ultra-capable team. On the other hand, the scenarios we are addressing are proving to be relevant in all walks of life of the health and humanitarian space, generating an increasing demand and with it, a simultaneous increase in breadth and depth on the demand side. Exciting times indeed!

Of our main technology efforts (Riff, GeoChat, Mesh4x, TrackerNews.net) Mesh4x (http://www.mesh4x.org) is the one that started getting the earliest deployments to the real world.

From mesh4x.org:

“The goal of mesh4x is to provide a portfolio of libraries, tools and applications that simplify using standards-based data meshes from multiple platforms and languages…”

The libraries can be used right away by developers who integrate them in their own applications, so there was no need for them to wait for a more packaged set of user interfaces and end to end experiences.

Why it matters and why InSTEDD is working on this

Data meshes have appealing characteristics for our users, so our contributions to the Mesh4x project are driven by observed data-sharing needs in the health and humanitarian space.

  • Symmetrical: They allow data to exist in a concurrent multi-master environment where updates can be applied at any node in the mesh.
  • Asynchronous: They allow offline updates to information and synchronization with other nodes without requiring data locks, essential for occasionally connected applications.
  • Dynamic: The synchronization can happen even in constantly changing connectivity topologies. I can sync with a server and later the sync can be done between my client and another client, who could then sync with another server if the first one isn’t there, and so on.

This matters to us as these characteristics help information flow and data sharing even in the tough contexts we face:

  • Symmetrical: No organization or application has, de-facto, greater control over information than any other. Symmetry allows power to be shared equally amongst partners, in a true multi-master way, resulting in less hoarding of live data.
  • Asynchronous: Connectivity is an occasional luxury, and the most up to date information is found where it is less likely to have a connection. Storing changes locally and sharing them opportunistically keeps information moving.
  • Dynamic: Connections are opportunistic – you may not have Internet access at all, but you have access to local wifi networks, physical contact with other devices, etc. Data will eventually get to the desired endpoints as it leaps opportunistically between participants.

Some concrete applications of mesh4x in the space:

Mesh4x goes mobile with JavaROSA, allows you to sync data on your handset with no Internet

Mesh4x SMS Adapter: Sync data without an Internet connection

I have another blog post I should release soon that highlights the proven value of meshes and Groove in the humanitarian space, and my personal introduction to the uses of this architectural pattern.

But this post is about the progress & directions for the project.

Cloud-Based Service

In the last post we mentioned building a cloud based services as a contribution to the space. The demand was for an always-online, cheap to host, simple server that could act as a storage of data and as a relay point for devices connected to the Internet.

The implementation was embarrassingly simple on Amazon’s Elastic Compute Cloud (EC2, a dynamic and virtualized hosting environment) and S3. As a matter of fact, a single Java servlet running on Tomcat + Linux and driving the Java Mesh4 sync libraries ("Mesh4j") provides the heart of the logic. Less code is the best code!

image We are doing a pilot with the Center for Disease control, synchronizing their Microsoft Access-based EpiInfo application, and they asked if the health surveys they were taking could be automatically geo-mapped as the users synchronized to share their information. This led to incorporate an ontology ("schema") mapping aspect to tell the server "expose a KML feed taking THIS as the title, description, address, and timestamp for the items"

Taha describes the work with CDC on his Biosurveillance 2.0 blog and why using mesh4x will help them extend the effectiveness of EpiInfo for outbreak investigation.

We will be opening this service up progressively as we test it out with initial users and tweak it based on their feedback; I hope in a couple of months to have a tested version we can point you to publicly! In the meantime, contact us if you are interested via email or if you are a developer via the Mesh4x.org code project.

Part of the forcing function for writing this post this week is that we’ve been chatting with CDC, JavaROSA, and others about these store/endpoint/mapping capabilities and I’d rather we start the collaboration early before we accidentally diverge codebases or approaches.

Under the Hood

This is the architecture that the server has been going towards these last couple of weeks:

AAaagh lots of coloured boxes! a drilldown to what the server architecture is trending to

Update APIs:

These allow other applications to change the data in the service. A mesh endpoint allows FeedSync-style updates, but we’ll add AtomPub for simpler edits via http POST and other RESTful verbs that are easy to manage from Javascript or are useful if you don’t need the full power of the mesh. A JavaROSA endpoint will allow the right metadata to be exposed to JavaROSA or AndroidROSA handsets, and accept updates.

The GeoChat and a FrontlineSMS bridge would allow message forwarding and sending semistructured data directly in via SMS.

Storage:

This is the storage layer for all the data and the configuration, security information, etc needed to keep the service running. In our web-based instance, all this data is stored in S3, but if you wanted to host this in your own office or in a clinic, it would all be sitting inside a MySQL instance. As a matter of fact, all the mesh4x services’ information is managed by mesh4x itself, so the actual configuration data is stored via an adapter.

Ontology Extraction:

Our service differs from a database in which you don’t need to tell it the schema of your information up front. As a matter of fact, we would like to know as little as possible about the format of your data. We prefer to let applications change and evolve the data they use without having to ask developers to change database structures or write specific code for each case. But knowing just a little about the structure of your data helps with things such as defining mappings and filters, so we try to infer as much as we can. The Ontology Extraction component allows you to submit RDF-formed information (or XForms-based or other any other formats that has a transformer) and we keep track of (for example) what fields make up your entities. If you supply such ontologies yourself (in RDFS, or an XForm Definition)we keep it around, too (e.g. ‘Patient Date of Birth is a Date/Time field’ ).

this thingie is supposed to represent an RDF triplet Internally, we are using RDF as the default standard to represent data and ontologies. RDF has many properties that make it the simplest appropriate choice, but that would be the topic of a whole different post in of itself.

Ontology Mapping:

Ontology Mapping allows us to map fields and entities of different ontologies to help us make sense of your data. For example, to do nice map of your data we need a title and a descriptive summary, a position, and a timestamp associated with the entity. Which field should provide the timestamp? Which address or coordinate fields should be used to put an item on the map? How should the description be composed from from the data? Mappers allow us to do this, and in a future through the user interface you will be able to define these yourself.

Filtering:

Filtering is essential in a mesh where little devices and big devices coexist. You could have refugee records for a whole country in one mesh4x mesh, but on a mobile phone you’d probably only want to keep a subset of that. As soon as we expose filters it will be easy for a phone to say ‘I work with patients in village X’ and just sync that subset of data.

Format Transformers:

Format Transformers are components built to translate data into specific formats. GeoRSS and KML are standard formats for representing information with geographic aspects to them. You can see the KML in Google Earth, for example, and items would appear on the map as people sync their data to the server.

Transformers for XForms Models and XForms form allow us to translate the information of your entities and their ontologies into XForm formats. We see the utility and the pragmatism of XForms models as a way of exchanging records and to define the UI model of the forms users see in XForms, so these transformers allow us to go from our internal RDF-centric representations to these broadly adopted formats.

Sync Adapters:

Finally, you have all this data here, but you probably want to work with it elsewhere! Folks have suggested/requested the following as potential endpoints for the data:

  • Google Spreadsheets: we have a Microsoft Excel adapter, so why not a Google spreadsheet one? Imagine creating a form, having it fill out a spreadsheet with gadgets for analytics, and then. Google spreadsheets are also great when lots of people online have to work live on the same data.
  • Zoho is coming up with lots of useful applications. Imagine synchronizing your Zoho app with a table in your MySQL or MS-Access database.
  • MySQL: a lot of websites out there -for good or for bad- run with their MySQL instance exposed on an open network port. Someone we were working with in Mukdahan, Thailand (a 12-hour truck ride from Bangkok), asked the simple question: if I give you my connection string, can you just put the data there for me? Seemed simple and straightforward, so we will line it up in front of other needs!

Together with running sync adapters we will have to have some user interface to schedule these updates, define mappings between schemas/ontologies, and resolve conflicts. A nice UI for this may end up taking a big pat of the project effort, so if you can reference us to open source projects that do any of this or want to contribute, don’t be shy!

These mappings are part of the mesh too, so in a future (assuming anyone requests InSTEDD or contributes the source) you could be offline and mark an excel spreadsheet as ‘shared’ and when you sync, not only the data would travel back and forth, but the server itself could create a Google spreadsheet endpoint (or something similar) with the same information for others in your team to use!

Putting it all together

In my next post I am explaining how all the pieces of the Mesh4x project come together to help data integration of disparate systems and helping connect these applications into a synthetic whole, instead of having dozens of islands of information.

More information

http://www.cdc.gov/epiinfo/ EpiInfo is CDC’s outbreak investigation surveying tool. You can participate in their Open Source project on CodePlex: http://www.codeplex.com/EpiInfo. We are working with them to enable synchronization over the cloud of their MySQL/Access based tool.

….And recently had a release, announced hours ago. Congratulations to the CDC team!

Comments are closed.

View more of InSTEDD's blog posts.