Text as Data Methods for Economics: Analysing Text, Image, and Audio with Python

Text as Data Methods for Economics: Analysing Text, Image, and Audio with Python

This workshop introduces economists to methods for analysing text and other forms of unstructured data using Python. Participants will learn how to transform text documents into structured datasets, apply natural language processing (NLP) and ma-chine learning methods, and incorporate text-based measures into empirical economic research.

Enrol Here
Enrol Here
£200.00
Guaranteed safe and secure checkout
2.5 Days
Online via Teams
Stata

Overview

This workshop introduces economists to methods for analyzing text and other forms of unstructured data using Python. Participants will learn how to transform text documents into structured datasets, apply natural language processing (NLP) and machine learning methods, and incorporate text-based measures into empirical economic research.The workshop covers the full workflow of text-as-data research: corpus construction, text preprocessing, document representations, supervised and unsupervised machine learning, topic models, and word embeddings. It also introduces modern NLP tools based on transformer models and large language models (LLMs), and briefly discusses how similar approaches can be applied to other types of unstructured data such as images and audio.

 

The workshop combines lectures with hands-on coding exercises using Python. The content is all applicable in Stata using the new feature, Pystata. 

Topics Covered

The course will cover the following topics:

  • Introduction to text as data in economics

  • Text processing and tokenisation

  • Document representations and similarity measures

  • Dictionary methods and text-based indicators

  • Supervised machine learning for text classification

  • Topic models and unsupervised learning

  • Word embeddings and semantic analysis

  • Modern NLP methods and large language models

  • Images as data for economists

  • Audio and speech data for economists

Course Structure

Total Duration: 15 hours

  • The workshop will be delivered over two and a half days
  • It will feature a total of 10 sessions of 1 hour and 30 minutes
  • Each session will have a 1 hour lecture and a 30 minute coding session

Agenda

Day 1:

Session 1: Introduction to Text as Data in Economics
Session 2: Text Preprocessing and Tokenisation
Session 3: Document Representation and Similarity
Session 4: Dictionary Methods and Text Measures
Day 2:

Session 5: Supervised Machine Learning with Text
Session 6: Topic Models and Unsupervised Learning
Session 7: Word Embeddings and Semantic Analysis
Session 8: Modern NLP Methods
Day 3

Session 9: Images as Data for Economists
Session 10: Audio and Speech Data for Economists

Prerequisites

Participants are expected to have a background in econometrics.No prior experience with natural language processing (NLP) is required. Basic familiarity with Python is helpful but not required. Participants will be provided with a short introductory Google Colab notebook covering basic Python concepts and the tools used in the workshop. Participants are expected to work through this notebook prior to the start of the workshop.

 

Software and Techinal Requirements:

All coding sessions will be conducted in Python and using Google Colab, a cloud based environment for running Python notebooks.

Participants do not need to install any software on their computers. A Google account and a web browser are sufficient to participate in the hands-on exercises.

Course Timetable

Subject to minor changes
Day Morning Session Morning Session Afternoon Session Afternoon Session
Day One 9.30am-11am (London time) 11.15am-12.45pm 14.00-15.30pm 15.45-17.15pm
Day Two 9.30am-11am (London time) 11.15am-12.45pm 14.00-15.30pm 15.45-17.15pm
Day 3 9.30am-11am (London time) 11.15am-12.45pm

Terms

  • Student registrations: Attendees must provide proof of full time student status at the time of booking to qualify for student registration rate (valid student ID card or authorised letter of enrolment).
  • Additional discounts are available for multiple registrations.
  • Delegates are provided with temporary licences for the software(s) used in the course and will be instructed to download and install the software prior to the start of the course.
  • Payment of course fees required prior to the course start date.
  • Registration closes 5-calendar days prior to the start of the course.
  • 100% fee returned for cancellations made over 28-calendar days prior to start of the course.
  • 50% fee returned for cancellations made 14-calendar days prior to the start of the course.
  • No fee returned for cancellations made less than 14-calendar days prior to the start of the course.

 

The number of delegates is restricted. Please register early to guarantee your place.

Delivered By