Dealing with classifications

When you request an export of the raw classification data using the project builder, some of the columns we return will actually contain values in a format called JSON. We do this, because sadly sometimes the kinds of data we track are too complicated to easily fit into a table structure. However, using Python it’s actually really easy to pull out the data you need from those JSON-based columns:

import pandas
import json

data = pandas.read_csv("classifications.csv")

data["annotations"]  = data["annotations"].map(json.loads)
data["metadata"]     = data["metadata"].map(json.loads)
data["subject_data"] = data["subject_data"].map(json.loads)

This will turn those columns into a normal Python dict. If you know your project has classifications pertaining to a single subject at a time, you can make things even simpler with a further step:

def flatten_subject_info(subject_data):
  result = subject_data.values()[0]
  result.update({'id': subject_data.keys()[0]})
  return result

data["subject_data"] = data["subject_data"].map(flatten_subject_info)

Note

This example assumes you have installed the Python library called Pandas. Many scientific Python distributions include this library, but you can install this with pip install pandas otherwise.

Todo

expand