I am interested in how to support people in interacting with data. Millions do it already, from querying databases, doing complex analysis in computational notebooks, working with data in spreadsheets, and interacting with information visualisations to understand a phenomena or relations through data. But in the age of the data deluge, the tools have a hard time keeping up and cognitive overload is an ever present challenge. I think we can create better tools for interacting with data, and I think there are some strong ideas from human-computer interaction and interaction design that can guide us.
In this post I explore the merits of thinking about the tools we use to interact with data as an instrument, based on ideas from instrumental interaction (Beaudouin-Lafon, 2000) and activity theory (Kaptelinin, 1996). As tool builders, this perspective allow us to examine the tools and how we use them, as well as exploring how we can improve the tool with a focus on what the tool can do, the goal and quality of interaction. Building better tools for interacting with data would also make the vast amounts of available data accessible to a larger audience and groups.
In instrumental interaction, interface elements (e.g. a colorpicker, a scrollbar, cropping tool, text-highlighting) are tools that we use to manipulate domain objects (data, documents, images etc.) through interaction. This view is from activity theory, where use is conceptualized as a subject using an instrument on a object towards realizing a goal. This could be someone using a SQL editor to query a library database with the goal of getting information on e.g most popular books based on loan data; it could be someone using a visualization library or diagrams in Excel to create a visualization; or it could be someone exploring their energy consumption through an information visualization. In all cases, instrumental interaction offer a framing for analyzing the tool within the context of an activity for the purpose of improving the tool.
One of the most common and fundamental ways to interact with data, is through using the Structured Query Language (SQL). Together with the relational database, it was developed to help "the non-professional user into effective communication with a formatted database" (Chamberlin & Boyce, 1974) and "to put end users into direct touch with the information stored in computers" (Codd, 2007). SQL was considered an ideal way for non-programmers to access and query data. It was developed with an onset in English (Structured English Query Language -- SEQUEL) and a declarative approach to data querying. SQL turned out to be hugely successful and is one of the most commonly used technologies today. However, the idea that the casual user would use SQL to interact with databases on a frequent basis have not come true (yet). I'm not saying that SQL is the ideal tool for casual users, but if we could improve the tool to cater more to casual and non-technical users (e.g. designers, journalists, citizens), we can make the vast amount of data in SQL databases accessible to a significantly larger audience.
Note: The original version of this note contained an interactive SQL shell demo. The interactive elements have been removed during migration.
In the original interactive example, you could use SQL as an instrument for interacting with a publications database. When we have to interact with an unfamiliar and undocumented database, we try to use the instrument to poke around to explore and discover how the data is organized and how to query for the information we need. This often involves multiple steps of trial-and-error to identify appropriate tables, their fields, relations and data:
SELECT name FROM sqlite_master;
SELECT * FROM authors LIMIT 5;
SELECT * FROM sqlite_master WHERE name='authors';
And you can do more complex stuff like querying multiple tables for specific results:
SELECT publications.title, authors.first, authors.last, publications.year
FROM publications
INNER JOIN authorship ON publications.wid=authorship.wid
INNER JOIN authors ON authorship.aid=authors.aid
WHERE authors.last='Dewey';
That is neat, but there are a couple of problems and/or
improvements we can introduce to make the tool more
usable. The first one is with the structure and
syntax of the query itself. For experienced users that are
familiar with the data model they are interacting with, it
is trivial to write
SELECT title FROM publications;. They know that
there is a table called publications and that
it contains a column with title. But for novice
SQL users or experienced users interacting with a new
database, the SELECT first approach does not
help us before we know where we select
FROM (using SELECT * as a probing
strategy can help). There are a couple of interaction design
ideas we can apply to solve this issue. First, for novice
users, we can add something as simple as autocomplete. We
know the abstract syntax tree for SQL, so it is trivial to
provide users with hints to both keywords and errors.
There is another small problem. Early research on the
usability of different query languages by Reisner (1981)
shows that some users perform poorly due to the design
choice of starting queries with SELECT instead
of allowing FROM first as well. Beginning with
SELECT or FROM represent two
fundamental query writing strategies. The former is the
translation strategy where the user is invited to
look for appropriate terms and search through trial and
error (or find the data model described elsewhere). The
latter is the operations-on-tables where the users
indicate where they want to search before they search
(operate). The operations-on-tables strategy is closer to
the idea of instrumental interaction, than the
translation strategy which focus more on semantic
understanding. So it seems silly that we require the users
to remember what they want to select before they know where
from. Let's just address the issue by improving our tool so
that valid SQL queries can start with FROM as
well.
There is another significant improvement we can make that
combines the autocomplete and utilizes allowing queries to
start with FROM. In many cases we know
something about what the user can query. We likely
have a finite number of tables that can go into our
autocomplete feature, and once the user have shown intent
with FROM we can also suggest columns. Putting
FROM in front makes the autocomplete flow
easier to implement, because we can narrow the suggestions
based on the table selection and not the column selection.
Considering SQL as an instrument for interacting with data allows us to consider multiple ways we can improve the tool without ever having to change the domain object, the database. Doing so could potentially allow us to make SQL accessible to more casual non-technical users and thereby make a vast amount of data accessible as well.
While we are fiddling with the SQL language, why not
introduce localization? If our future users are casual users
and we do not expect them to write SQL without help, why do
we expect them to use English? I know that SQL is derived
from Structured English Query Language (SEQUEL), but when
people use it across the globe and within a broad range of
domains, I don't see why we cannot offer localization or
customized queries to increase accessibility and semantic
intuition. And when the database, table and column names are
already in local languages, why force people to mix and
match when they try to translate their question to a query?
If Danish journalists think "hent en liste over alle
folketingets medlemmer, der stemte for lovforslag L87"
(fetch me a list of all the members of parliament who voted
for bill L87), why not make HENT synonymous
with SELECT? That would be trivial.
Similarly, and with more recent research into both Large Language Models and Natural Language Interfaces for Databases, our tool could be improved with both domain specific language features and/or a database specific LLM that supported queries based on natural language.
There are other improvements to consider as well. Combining writing queries with other types of feedback, e.g. tabular views, visualizations and tabular views to support more complex queries like joins and group by. This is something I will investigate in a later post.
Finally, when we 'detach' SQL from the database through the idea of SQL as an instrument, we can also explore multimodal interaction where our SQL tool can be used to interact with other data sources (APIs, tables, structured data etc.) and we can develop other tools that can interact with our database as well. This multi-multi scenario where users can use familiar tools with unfamiliar data or people can collaborate without being forced to use the same tool (tool asymmetry) is key in supporting collaboration across disciplines, e.g. in open data projects.