Instrumental Data Interaction

Status: Early draft

I am interested in how to support people in interacting with data. Millions do it already, from querying databases, doing complex analysis in computational notebooks, working with data in spreadsheets, and interacting with information visualisations to understand a phenomena or relations through data. But in the age of the data deluge, the tools have a hard time keeping up and cognitive overload is an ever present challenge. I think we can create better tools for interacting with data, and I think there are some strong ideas from human-computer interaction and interaction design that can guide us.

In this post I explore the merits of thinking about the tools we use to interact with data as an instrument, based on ideas from instrumental interaction (Beaudouin-Lafon, 2000) and activity theory (Kaptelinin, 1996). As tool builders, this perspective allow us to examine the tools and how we use them, as well as exploring how we can improve the tool with a focus on what the tool can do, the goal and quality of interaction. Building better tools for interacting with data would also make the vast amounts of available data accessible to a larger audience and groups.

In instrumental interaction, interface elements (e.g. a colorpicker, a scrollbar, cropping tool, text-highlighting) are tools that we use to manipulate domain objects (data, documents, images etc.) through interaction. This view is from activity theory, where use is conceptualized as a subject using an instrument on a object towards realizing a goal. This could be someone using a SQL editor to query a library database with the goal of getting information on e.g most popular books based on loan data; it could be someone using a visualization library or diagrams in Excel to create a visualization; or it could be someone exploring their energy consumption through an information visualization. In all cases, instrumental interaction offer a framing for analyzing the tool within the context of an activity for the purpose of improving the tool.

SQL and instrumental interaction with data

One of the most common and fundamental ways to interact with data, is through using the Structured Query Language (SQL). Together with the relational database, it was developed to help "the non-professional user into effective communication with a formatted database" (Chamberlin & Boyce, 1974) and "to put end users into direct touch with the information stored in computers" (Codd, 2007). SQL was considered an ideal way for non-programmers to access and query data. It was developed with an onset in English (Structured English Query Language -- SEQUEL) and a declarative approach to data querying. SQL turned out to be hugely successful and is one of the most commonly used technologies today. However, the idea that the casual user would use SQL to interact with databases on a frequent basis have not come true (yet). I'm not saying that SQL is the ideal tool for casual users, but if we could improve the tool to cater more to casual and non-technical users (e.g. designers, journalists, citizens), we can make the vast amount of data in SQL databases accessible to a significantly larger audience.

SQL as a tool

Note: The original version of this note contained an interactive SQL shell demo. The interactive elements have been removed during migration.

In the original interactive example, you could use SQL as an instrument for interacting with a publications database. When we have to interact with an unfamiliar and undocumented database, we try to use the instrument to poke around to explore and discover how the data is organized and how to query for the information we need. This often involves multiple steps of trial-and-error to identify appropriate tables, their fields, relations and data:

SELECT name FROM sqlite_master;

SELECT * FROM authors LIMIT 5;

SELECT * FROM sqlite_master WHERE name='authors';

And you can do more complex stuff like querying multiple tables for specific results:

SELECT publications.title, authors.first, authors.last, publications.year
FROM publications
INNER JOIN authorship ON publications.wid=authorship.wid
INNER JOIN authors ON authorship.aid=authors.aid
WHERE authors.last='Dewey';

Improving our tool

That is neat, but there are a couple of problems and/or improvements we can introduce to make the tool more usable. The first one is with the structure and syntax of the query itself. For experienced users that are familiar with the data model they are interacting with, it is trivial to write SELECT title FROM publications;. They know that there is a table called publications and that it contains a column with title. But for novice SQL users or experienced users interacting with a new database, the SELECT first approach does not help us before we know where we select FROM (using SELECT * as a probing strategy can help). There are a couple of interaction design ideas we can apply to solve this issue. First, for novice users, we can add something as simple as autocomplete. We know the abstract syntax tree for SQL, so it is trivial to provide users with hints to both keywords and errors.

There is another small problem. Early research on the usability of different query languages by Reisner (1981) shows that some users perform poorly due to the design choice of starting queries with SELECT instead of allowing FROM first as well. Beginning with SELECT or FROM represent two fundamental query writing strategies. The former is the translation strategy where the user is invited to look for appropriate terms and search through trial and error (or find the data model described elsewhere). The latter is the operations-on-tables where the users indicate where they want to search before they search (operate). The operations-on-tables strategy is closer to the idea of instrumental interaction, than the translation strategy which focus more on semantic understanding. So it seems silly that we require the users to remember what they want to select before they know where from. Let's just address the issue by improving our tool so that valid SQL queries can start with FROM as well.

There is another significant improvement we can make that combines the autocomplete and utilizes allowing queries to start with FROM. In many cases we know something about what the user can query. We likely have a finite number of tables that can go into our autocomplete feature, and once the user have shown intent with FROM we can also suggest columns. Putting FROM in front makes the autocomplete flow easier to implement, because we can narrow the suggestions based on the table selection and not the column selection.

Summary: Playing language games

Considering SQL as an instrument for interacting with data allows us to consider multiple ways we can improve the tool without ever having to change the domain object, the database. Doing so could potentially allow us to make SQL accessible to more casual non-technical users and thereby make a vast amount of data accessible as well.

While we are fiddling with the SQL language, why not introduce localization? If our future users are casual users and we do not expect them to write SQL without help, why do we expect them to use English? I know that SQL is derived from Structured English Query Language (SEQUEL), but when people use it across the globe and within a broad range of domains, I don't see why we cannot offer localization or customized queries to increase accessibility and semantic intuition. And when the database, table and column names are already in local languages, why force people to mix and match when they try to translate their question to a query? If Danish journalists think "hent en liste over alle folketingets medlemmer, der stemte for lovforslag L87" (fetch me a list of all the members of parliament who voted for bill L87), why not make HENT synonymous with SELECT? That would be trivial.

Similarly, and with more recent research into both Large Language Models and Natural Language Interfaces for Databases, our tool could be improved with both domain specific language features and/or a database specific LLM that supported queries based on natural language.

There are other improvements to consider as well. Combining writing queries with other types of feedback, e.g. tabular views, visualizations and tabular views to support more complex queries like joins and group by. This is something I will investigate in a later post.

Finally, when we 'detach' SQL from the database through the idea of SQL as an instrument, we can also explore multimodal interaction where our SQL tool can be used to interact with other data sources (APIs, tables, structured data etc.) and we can develop other tools that can interact with our database as well. This multi-multi scenario where users can use familiar tools with unfamiliar data or people can collaborate without being forced to use the same tool (tool asymmetry) is key in supporting collaboration across disciplines, e.g. in open data projects.

← Back to Notes