dinsdag 30 juni 2009

Development priorities

We have set some development priorities after the first release that we did recently. These are partly reflected in the Ticket tracker, but to be honest that doesn't quite do it justice yet.

A small but important update is to change the output format for the Matrix Compiler tool. Currently, it outputs to a full matrix in DL format. This is a bit inflexible, as it does not allow attributes other than a label for the nodes, and it outputs a full matrix even for a large, sparce matrix. That results in bigger output files, and thus longer processing/io time. In the (hopefull near) future, it will allow for the addition of including attributes to nodes as well as connections. You can track the progress on this issue here. We're already testing it...

A larger task has to do with reworking the Record Grouper and the Relation Calculator. The first part of that job is to specialize the Record Grouper to do only that: group objects based on some kind of relation between them. This means ripping out a large piece of complicated code (but not throwing that away, see later) and focus on a good UI to make it easier to work with. This means fine tuning the layout, but also add options to undo, store and re-load your work, etc. That is a lot of work in itself, but it will simplify the code considderably making it easier to maintain.

Another large task is to get the Relation Calculator into a usable shape. This is a complex tool. The basic idea is that will become a specilized tool to calculate a similarity or distance measure between any two objects in the database, and be as flexible as possible as to how to calculate that. Currently, we only use SQL queries to calculate such scores, but that is sometimes limited, often complex, and usually relatively slow because most SQL backends don't use multi-threading for single queries, let alone utilize things like letting the videocard do work for you.

You can express a lot of such measures in SQL, but it is often complex to do, especially if you are not that used to using databases. That makes the current way of working harder to use for novices, but also for seasoned researchers who just are not that into SQL. It is however more flexible than being stuck to defaults too much. In this tool, we want to make the standard things easy, and yet be as flexible as possible to enable more advanced use.

The goal is to make the standard analysis that you would normally run on a database generated by the ISI Data Importer and processed by the Word Splitter and optionally the Record Grouper as easy as selecting them and optionally set some time slices or threshold, after which ready-to-analyze output is generated. We aim to release a first working combination of these tools at the end of August.

Geen opmerkingen:

Een reactie posten