How to create a Tabulated Open Questionnaire (TOQ) specification

methodology
measurement

April 11, 2024

In this brief blog post, I describe a number of steps you could take to create a Tabulated Open Questionnaire specification.

Tabulated Open Questionnaire specifications (or TOQ specs for short) are questionnaires described in an open standard. This allows converting them to Serialized Open Questionnaire specifications (or SOQ specs for short), which in term can be included in open repositories such as https://operationalizations.com. For more background information, you can check the brief blog post / Mastodon thread Towards interoperability for psychological data.

The examples in this blog post are taken from the TOQ specs for the questionnaires with UQID URLs https://operationalizations.com/questionnaire/eq60eng_7rs8g3bd and https://operationalizations.com/questionnaire/bfi10eng_7sp9mjx3.

This post starts with a step-by-step guide to specifying a new questionnaire. That guide is followed by a full description of the spreadsheet format.

Specifying a new questionnaire

If you want to specify a new questionnaire in TOQ format, first copy an empty TOQ specification, for example that at https://measurement.rocks/toq-spec-example (click the File menu and select “Make a copy”).

The questionnaire label

Then, in the metadata_content column, in the row that has “label” in the metadata_field column, type the name of the questionnaire. The recommended format has the name, followed by, between parentheses, the language as an ISO-639-3 code and the number of items.

The questionnaire language

In the language_ISO639_3 row, specify the questionnaire’s language. This is the language used for the verbal stimuli presented to participants, for example in the questions, in the introduction, or in any other way.

Specify the language as an ISO 639 3 standard code. You can find these at https://iso639-3.sil.org/code_tables/639/data, through https://en.wikipedia.org/wiki/ISO_639-3, or through a variety of other resources.

The Unique Questionnaire Identifier

Then, produce a Unique Questionnaire Identifier (UQID) for the questionnaire. This consists of two parts. The first part is called the uqid_prefix, and you create it yourself. It can only contain Latin letters (a-z and A-Z) and Arabic digits (0-9) and must always start with a Latin letter. The recommended format is a short acronym for the questionnaire, followed by the number of items, and then the ISO 639-3 language code for the language, for example “eq60eng” and “bfi10eng”.

The second part is a string of characters as produced by the {psyverse} R package (see the manual at https://psyverse.opens.science) or the Elsa Shiny App (see https://opens.science/apps/elsa. So either visit Elsa or fire up R and create the full UQID.

The measured construct(s)

Another important field to provide content for is the ucids field. Here, you can specify a URL linking to a comprehensive construct definition using it’s Unique Construct Identifier (or UCID), such as https://psycore.one/personality_79n2fh4s or https://psycore.one/expAttitude_expectation_73dnt5z1. Make sure to use the URL format (unlike for the UQID), so that the construct definition is immediately available (without people having to find out which construct repository it’s stored in).

This is a field that can be repeated on multiple rows; the metadata_field content is ucids on every row, and by specifying a different UCID URL in the metadata_content column, you can specify multiple constructs that you think should be measurable with the questionnaire you’re adding.

If there is no construct definition stored yet, you can either add it yourself (see https://doi.org/jnjp), contact others and add it together, or leave ucids field empty.

Sources

Often, one or more articles or websites will be available with more background information on the questionnaire. You can specify these in content on rows identified with sources in the metadata_field column. Like ucids rows, the sources rows can appear multiple times, so you can specify multiple sources.

If you specify scientific sources, try to find the Digital Object Identifier (DOI) for the source, and enter it at https://shortdoi.org to obtain the corresponding ShortDOI. ShortDOIs are very handy short unique identifiers for articles, chapters, and other objects with a DOI. ShortDOIs have a number of advantages over DOIs, such as being short 😬 – but also only consisting of Latin letters [a-z] and Arabic digits, [0-9], which means they can be (part of) filenames or column names in datasets (unlike DOIs, which contain slashes (/) and can contain other special characters). Therefore, ShortDOIs are used by the {metabefor} system for systematic reviews to unique identify sources (such as articles), which is another reason to use them here, too.

Regardless of whether you managed to obtain a ShortDOI, enter the sources as URLs, for example https://doi.org/djzd32, https://doi.org/dbh (how awesome is that?!? A three-character unique identifier for an article 🤩), or something like https://measures.scienceofbehaviorchange.org/measuredetails/da946282-a16c-46cf-a67c-40a2eb47d398.

Ontology references

In some cases, you may want to refer to an ontology, for example to a representation of the construct or the measurement instrument. You can use ontologyRefs for this. Like rows for ucids and sources, rows for ontologyRefs can be repeated to specify multiple references. And like for those two fields, make sure that you enter a valid URL that resolves to the relevant element in the relevant ontology.

The questionnaire description

In the description row, you can briefly describe the questionnaire. There are dedicated fields for more specific information, so you don’t have to go into too much detail (see below).

The measurement theory

A measurement instrument like a questionnaire is always based on a theory of how the measurement instrument works. This often involves the target construct and how the answers people give to the questions somehow crucially depend on that target construct. You can specify this as the measurement_theory. If you don’t know the measurement theory, you can also specify that (and then hopefully you can add it in a later version).

Note that while, if we take the idea that questionnaires can measure constructs seriously, the measurement theory underlying the questionnaire is very important, when adding questionnaires to a repository, specifying the measurement theory in detail is not your main responsibility. Also, many questionnaires are developed without corresponding measurement theories. Although that is a problematic practice, it is also often simply a fact of life, and should not prohibit those questionnaires from being added to an open repository.

Therefore, your primary task here is to be as transparent as you can. If as far as you know there is no source describing the measurement theory, it’s ok to just say that. If there might be sources but you don’t have time to look, similarly, just say that. And if you suspect the measurement theory to be described in a given source, but you don’t have time to read it, it’s also ok to say that.

Often, the important thing is to specify the questionnaire and its items in a machine-readable format (in the TOQ spec). That will also facilitate further work on the questionnaire’s measurement theory. There will be ample opportunity to specify that later.

The measurement scope

Questionnaires (and other measurement instruments) are developed with certain constrainsts. For example, thermometers generally only work on Earth and in a narrow range of temperatures (only -40 to 50 degrees celcius). Similarly, questionnaires also only work for specific populations in specific circumstances. This scope is described here.

For example, if a questionnaire has only been studied among a very narrow, specific population (for example, first- or second-year psychology students in the Netherlands), it is unknown how it will perform in different populations (for example, students in other studies; or students in other countries; or the general population in the Netherlands; or the general population in other countries; or parents with one or more children; or people who are retired; or adolescents; etc).

For scope, like for measurement theory, the most important thing is to maximize transparency. Describe what you know. Doing this already enables others to build on this, for example by adding extra information they have about the scope where we know the questionnaire can be used.

The composition, administration, and aggregation

TOQ specs were developed as a machine-readable (and so, interoperable) format for questionnaires. They contain the most important parts: the questions (and ‘flanking’ content). However, the way these components need to be composed into one questionnaire can differ. For example, if a certain font needs to be used or a certain paper color, or the items need to be presented in a certain way, this can be specified in the composition_instructions.

Sometimes, there are specific procedures or requirements for the administration of a questionnaire (i.e. the procedure used to present the questionnaire to participants and register the responses). This can be provided as administration_procedure.

Often, the data series produced by participant responses to each item are somehow aggregated into one or several (if there are subscales) data series. For some questionnaires, this can mean averaging the data series for all items; for others, it can mean summing them; sometimes, some items have to be inverted first (e.g. by subtracting the data series from a number, e.g. for 5-point likert scales scores 1-5, subtracting every value from 6), and sometimes, a questionnaire has multiple subscales, and those are aggregated to arrive at multiple final estimates. Whatever the procedure is, it can be described here in the aggregation_procedure_narrative.

TOQ spec version, date, and authors

Precursors

toq_specification_version toq_specification_date toq_specification_authorNames toq_specification_authorOrcids toq_specification_authorContacts toq_specification_precursors

Comments

If you have any additional comment,s you can specify them as comments.

Worksheets in a TOQ spec

A TOQ spec is a spreadsheet with six worksheets (each will be explained in detail below):

  • metadata: metadata about the questionnaire
  • items: the items in the questionnaire
  • response_registration_templates: templates that describe how participants’ responses are registered
  • adapters: information about how adapters should convert the questionnaire to their corresponding formats
  • flanking_content: “non-item questionnaire content”, such as an introduction or closing message
  • content_types: the content types that can use used to specify the stimuli that comprise the questionnaire

The metadata worksheet

The metadata worksheet has two columns. The first is named metadata_field and the second is named metadata_content. In each row, the metadata_content cell holds information that is identified by the metadata_field column.

Each field (as identified in the metadata_field column) can only occur once (i.e. on one row), with three exceptions: the ucids field, the sources field, and the ontologyRefs field can occur multiple times.

The metadata fields that can occur are the following:

  • label: A human readable label (title or name) for the questionnaire. The recommended format is “Name (Language, # items)”, for example “Empathy Quotient (English, 60 items)” and “Big Factor Index (English, 10 items)”.
  • uqid_prefix: The prefix of the Unique Questionnaire Identifier (UQID). This can only contain Latin letters (a-z and A-Z) and Arabic digits (0-9) and must always start with a Latin letter. The recommended format is a short acronym for the questionnaire, followed by the number of items, and then the ISO639-3 language code for the language, for example “eq60eng” and “bfi10eng”.
  • uqid: The full UQID, which consists of the UQID prefix, an underscore, and a string of characters as produced by the {psyverse} R package (see the manual at https://psyverse.opens.science) or the Elsa Shiny App (see https://opens.science/apps/elsa.
  • ucids: One or more Unique Construct Identifiers (UCIDs), in the URL format (for example https://psycore.one/personality_79n2fh4s). These UCID URLs link to comprehensive construct definitions of the construct(s) that this questionnaire measures.
  • sources: One or more URLs to source(s) with more information about this questionnaire, preferably using the ShortDOI format for articles, for example https://doi.org/djzd32 and https://doi.org/dbh.
  • ontologyRefs: One or more URLs to relevant entities (or, classes, attributes, etc) in an ontology.
  • description: A description of the questionnaire.
  • language_ISO639_3: The language of the questionnaire, as ISO639-3 code (e.g., to list the 10 most spoken languages: “eng” for English, “zho” for Chinese, “hin” for Hindi, “spa” for Spanish, “fra” for French, and “ara” for Arabic, “ben” for Bengali, “por” for Portuguese, “rus” for Russian, and “urd” for Urdu).
  • measurement_theory: The measurement theory underlying the measurement instrument. This explains the assumptions and theory that valid measurement relies on.
  • measurement_scope: The scope of the measurement instrument. Just like most thermometers only work on Earth and in a narrow range of temperatures (only -40 to 50 degrees celcius), questionnaire also only work for specific populations in specific circumstances. This scope is described here.
  • composition_instructions: Instructions for composing the measurement instrument from the items and the flanking content. This can be, for example, something like “Typically, the items are presented in a matrix/array format (with the question texts in rows and the response options in four columns).”
  • administration_procedure: The procedure used to administer the measurement instrument. For example, whether participants can be disturbed while completing the questionnaire; whether there is any additional instruction that has to be provided verbally by the administrator; whether the participant has to be in quiet surroundings or whether that doesn’t matter; etc.
  • aggregation_procedure_narrative: A narrative description of the aggregation procedure for the data series formed by participants’ responses to the items. For example, responses to some items may need to be transformed (for example, subtracting the resulting value from another value, to invert them); and the data series can be averaged or summed, etc.
  • version: The version of the specification. For example, you may want to update information about the measurement scope as you learn more about how the measurement instrument works / performs. Note that if you change items, you cannot assume that you don’t change the nature of the measurement instrument, so you should update the identifiers (and so, technically, you are creating a new measurement instrument).
  • comments: Any comments you may want to make.

The items worksheet

The items worksheet has the following columns:

  • sequence: Numbers that indicate the order of the items in the questionnaire.
  • uiid_prefix: The prefix to the Unique Item Identifier (UIID). This prefix is a more or less human-readable component of the UIID the helps with quick identification of the item. The prefix does not have to be unique; the other components of the UIID takes care of that. The UIID prefix can only contain Latin letters (a-z and A-Z) and Arabic digits (0-9), and must always start with a Latin letter. The recommended format is to use one word (or two, or three; as few as possible) to describe the item content. Examples are reserved and enterConverse.
  • uiid: The item’s full Unique Item Identifier (UIID). The recommended format is to start with the questionnaire’s UQID prefix, followed by an underscore, followed by the UIID prefix, followed by an underscore, and a string of characters as produced by the {psyverse} R package (see the manual at https://psyverse.opens.science) or the Elsa Shiny App (see https://opens.science/apps/elsa. Examples are bfi10eng_reserved_7sp9pd51 and eq60eng_enterConverse_7rs8rnq8.
  • question_text: The text of the question, in the format specified in the content_type column. Together with the item’s response registration template (see the rrTemplateId column) the question text forms the item.
  • rrTemplateId: The item’s response registration template. This is a response registration template identifier of a response registration template described in the response_registration_templates worksheet.
  • content_type: The content type of the question text. At present, this can only be “html” (for HTML), “text” (for plain text), and “markdown” (for Markdown formatted text). In principle “image”, “audio” and “video” can also be used, but at present, none of the adapters process those content types yet.
  • subscale: Optionally, a subscale of the item. These can be referred to in the content for the aggregation_procedure_narrative field in the metadata worksheet, to enable aggregating subsets of items.
  • comments: Any comments you may want to make.

The response_registration_templates worksheet

This worksheet has the following columns:

  • template_id: The response registration template identifier. All response options with the same response registration template identifier together form that response registration template. Response registration template identifiers can only contain Latin letters (a-z and A-Z) and Arabic digits (0-9), and must always start with a Latin letter.
  • response_option_content: The content of the response option, in the format specified in the content_type column.
  • identifier: The response option identifier within the response registration template. Response option identifiers can only contain Latin letters (a-z and A-Z) and Arabic digits (0-9), and must always start with a Latin letter. This will be used by SOQ or TOQ adapters and can be used to unequivocally and efficiently refer to a response option in a questionnaire.
  • content_type: The content type of the response option. At present, this can only be “html” (for HTML), “text” (for plain text), and “markdown” (for Markdown formatted text). In principle “image”, “audio” and “video” can also be used, but at present, none of the adapters process those content types yet.
  • response_option_sequence: Numbers that indicate the order of the response options in each item.
  • response_option_value: The value to register when this response option represented participant’s response.

The adapters worksheet

This worksheet contains information for SOQ and TOQ adapters: software that can convert a questionnaire specified in a SOQ or TOQ specification to another format. This enabled automatic creation of files that can be imported into software for data collection, such as LimeSurvey or {formr}, or even closed science (paywalled) software such as Qualtrics.

Each row in this worksheet specifies a directive for an adapter by specifying the field name and the contents. These directives are read by adapters to parametrize their conversion of this questionnaire.

This worksheet has the following columns:

  • target_format: The format for which this row specifies a directive. This is used by the relevant adapter when it reads this worksheets and collects its directives. As such, this has to be meaningful given how the adapter is written (i.e. it has to look for rows with these values).
  • target_wikidata_id: If available, a wikidata identifier for the format to which the adapter will export.
  • uiid: Either a star (*) to indicate that this directive pertains to all items, or the UIID to which this directive pertains.
  • field: The field for which this row specifies a directive.
  • content: The directive’s content.

The flanking_content worksheet

This worksheet has the following columns:

  • position: Whether the content should appear before or after the items.
  • sequence: If multiple bits of content are specified, their order.
  • content: The content to present.
  • content_type: The content type (see the content_types worksheet).

The content_types worksheet

This worksheet enables processing different content types in different ways. Its has the following columns:

  • id: The content type identifier.
  • comments: Any comments about this content type.