Presentation

codefinder: optimising Stata for the analysis of large, routinely collected healthcare data

13 September 2024

Jonathan Batty, Marlous Hall

Abstract

Routinely collected healthcare data (including electronic healthcare records and administrative data) are increasingly available at the whole-population scale, and may span decades of data collection. These data may be analysed as part of clinical, pharmacoepidemiologic and health services research, producing insights that improve future clinical care. However, the analysis of healthcare data on this scale presents a number of unique challenges. These include the storage of diagnosis, medication and procedure codes using a number of discordant systems (including ICD-9 and 10, SNOMED-CT, Read codes, etc.) and the inherently relational nature of the data (each patient has multiple clinical contacts, during which multiple codes may be recorded). Pre-processing and analysing these data using optimised methods has a number of benefits, including minimisation of computational requirements, analytic time, carbon footprint and cost.
We will focus on one of the main issues faced by the healthcare data analyst: how to most efficiently collapse multiple, disparate diagnosis codes (stored as strings across a number of variables) into a discrete disease entity, using a pre-defined code list. A number of approaches (including the use of Boolean logic, the inlist function, string functions and regular expressions) will be sequentially benchmarked in a large, real-world healthcare dataset (n = 192 million hospitalisation episodes during a 12-year period; approximately 1 terabyte of data). The time and space complexity of each approach (in addition to its carbon footprint), will be reported. The most efficient strategy has been implemented into our newly-developed Stata command: codefinder, which will be discussed.

Name	Description	Lifetime
ADD_TO_CART	(Adobe Commerce only) Used by Google Tag Manager	1 Year
GUEST-VIEW	Stores the Order ID that guest shoppers use to retrieve their order status. Guest orders view. Used in Orders and Returns widgets	1 Year
LOGIN_REDIRECT	Preserves the destination page that was loading before the customer was directed to log in	1 Year
MAGE-BANNERS-CACHE-STORAGE	(Adobe Commerce only) Stores banner content locally to improve performance	1 Year
MAGE-MESSAGES	Tracks error messages and other notifications that are shown to the user	1 Year
MAGE-TRANSLATION-STORAGE	Stores translated content when requested by the shopper	1 Year
MAGE-TRANSLATION-FILE-VERSION	Tracks the version of translations in local storage	1 Year
PRODUCT_DATA_STORAGE	Stores configuration for product data related to Recently Viewed/Compared Products	1 Year
RECENTLY_COMPARED_PRODUCT	Stores product IDs of recently compared products	1 Year
RECENTLY_COMPARED_PRODUCT_PREVIOUS	Stores product IDs of previously compared products for easy navigation	1 Year
RECENTLY_VIEWED_PRODUCT	Stores product IDs of recently viewed products for easy navigation	1 Year
RECENTLY_VIEWED_PRODUCT_PREVIOUS	Stores product IDs of recently previously viewed products for easy navigation	1 Year
REMOVE_FROM_CART	(Adobe Commerce only) Used by Google Tag Manager	1 Year
STF	Records the time messages are sent by the SendFriend	1 Year
X-MAGENTO-VARY	Configuration setting that improves performance when using Varnish static content caching	1 Year
FORM_KEY	A security measure that appends a random string to all form submissions to protect the data from Cross-Site Request Forgery	1 Year
MAGE-CACHE-SESSID	The value of this cookie triggers the cleanup of local cache storage	1 Year
MAGE-CACHE-STORAGE	Local storage of visitor-specific content that enables ecommerce functions	1 Year
MAGE-CACHE-STORAGE-SECTION-INVALIDATION	Forces local storage of specific content sections that should be invalidated	1 Year
PERSISTENT_SHOPPING_CART	Stores the key (ID) of persistent cart to make it possible to restore the cart for an anonymous shopper	1 Year
PRIVATE_CONTENT_VERSION	Appends a random, unique number and time to pages with customer content to prevent them from being cached on the server	1 Year
SECTION_DATA_IDS	Stores customer-specific information related to shopper-initiated actions, such as wish list display and checkout information	1 Year
STORE	Tracks the specific store view/locale selected by the shopper	1 Year

Name

Description

ADD_TO_CART

(Adobe Commerce only) Used by Google Tag Manager

1 Year

GUEST-VIEW

Stores the Order ID that guest shoppers use to retrieve their order status. Guest orders view. Used in Orders and Returns widgets

1 Year

LOGIN_REDIRECT

Preserves the destination page that was loading before the customer was directed to log in

1 Year

MAGE-BANNERS-CACHE-STORAGE

(Adobe Commerce only) Stores banner content locally to improve performance

1 Year

MAGE-MESSAGES

Tracks error messages and other notifications that are shown to the user

1 Year

MAGE-TRANSLATION-STORAGE

Stores translated content when requested by the shopper

1 Year

MAGE-TRANSLATION-FILE-VERSION

Tracks the version of translations in local storage

1 Year

PRODUCT_DATA_STORAGE

Stores configuration for product data related to Recently Viewed/Compared Products

1 Year

RECENTLY_COMPARED_PRODUCT

Stores product IDs of recently compared products

1 Year

RECENTLY_COMPARED_PRODUCT_PREVIOUS

Stores product IDs of previously compared products for easy navigation

1 Year

RECENTLY_VIEWED_PRODUCT

Stores product IDs of recently viewed products for easy navigation

1 Year

RECENTLY_VIEWED_PRODUCT_PREVIOUS

Stores product IDs of recently previously viewed products for easy navigation

1 Year

REMOVE_FROM_CART

(Adobe Commerce only) Used by Google Tag Manager

1 Year

STF

Records the time messages are sent by the SendFriend

1 Year

X-MAGENTO-VARY

Configuration setting that improves performance when using Varnish static content caching

1 Year

FORM_KEY

A security measure that appends a random string to all form submissions to protect the data from Cross-Site Request Forgery

1 Year

MAGE-CACHE-SESSID

The value of this cookie triggers the cleanup of local cache storage

1 Year

MAGE-CACHE-STORAGE

Local storage of visitor-specific content that enables ecommerce functions

1 Year

MAGE-CACHE-STORAGE-SECTION-INVALIDATION

Forces local storage of specific content sections that should be invalidated

1 Year

PERSISTENT_SHOPPING_CART

Stores the key (ID) of persistent cart to make it possible to restore the cart for an anonymous shopper

1 Year

PRIVATE_CONTENT_VERSION

Appends a random, unique number and time to pages with customer content to prevent them from being cached on the server

1 Year

SECTION_DATA_IDS

Stores customer-specific information related to shopper-initiated actions, such as wish list display and checkout information

1 Year

STORE

Tracks the specific store view/locale selected by the shopper

1 Year

Name	Description	Lifetime
CUSTOMER_SEGMENT_IDS	Stores your Customer Segment ID	1 Year
EXTERNAL_NO_CACHE	A flag that, indicates whether caching is on or off	1 Year
FRONTEND	Your session ID on the server	1 Year
GUEST-VIEW	Allows guests to edit their orders	1 Year
LAST_CATEGORY	The last category you visited	1 Year
LAST_PRODUCT	The last product you looked at	1 Year
NEWMESSAGE	Indicates whether a new message has been received	1 Year
NO_CACHE	Indicates whether it is allowed to use cache	1 Year

Name	Description	Lifetime
MG_DNT	Allows you to restrict Adobe Commerce data collection if you have custom code to manage cookie consent on your site	1 Year
USER_ALLOWED_SAVE_COOKIE	Used for cookie restriction mode	1 Year
AUTHENTICATION_FLAG	Indicates if a shopper has signed in or signed out	1 Year
DATASERVICES_CUSTOMER_ID	Indicates if a shopper has signed in or signed out	1 Year
DATASERVICES_CUSTOMER_GROUP	Indicates a customer's group. This cookie is stored as sha1 checksum of the customer's group ID	1 Year
DATASERVICES_CART_ID	Identifies a shopper's cart actions	1 Year
DATASERVICES_PRODUCT_CONTEXT	Identifies a shopper's product interactions. This cookie contains the customer's unique quote ID in the system	1 Year

Webinars, Workshops & Conferences

codefinder: optimising Stata for the analysis of large, routinely collected healthcare data

Privacy Overview

Webinars, Workshops & Conferences

Privacy Overview

Essential

Marketing

Functionality

Statistical