Data pipelines

info

This page covers how to get your scattered data out of various services and into CSV format ready for import into Obsidian.

Personal Data Hub series

The problem

My data was scattered everywhere. Health metrics locked in Apple Health. Purchased books across Amazon and Apple. Watch history spread across eight streaming services. Once you cancel a service your watch history is gone until you pay again.

Your data belongs to you. Getting it back often requires detective work, custom tools, and digging into databases you were never meant to see.

1. Health pipeline

Apple does not provide an easy way to export health data to CSV. I use an app called HealthFit as a bridge and added Google Sheets export.

Step-by-step

Configure HealthFit - Add Google Sheets as an export destination. This creates a Health Metrics spreadsheet with tabs for daily metrics, weight, sleep, and mindfulness.
Create Google Script - Add a daily export script that extracts each tab to CSV:

function exportHealthData() {
  const ss = SpreadsheetApp.getActiveSpreadsheet();
  const folder = DriveApp.getFolderById('YOUR_FOLDER_ID');
  
  const sheets = ['Metrics', 'Sleep', 'Weight', 'Mindfulness'];
  sheets.forEach(name => {
    const sheet = ss.getSheetByName(name);
    const csv = convertToCSV(sheet);
    folder.createFile(name.toLowerCase() + '.csv', csv, MimeType.CSV);
  });
}

function convertToCSV(sheet) {
  const data = sheet.getDataRange().getValues();
  return data.map(row => row.join(',')).join('\n');
}

Add trigger - Set a time-driven trigger for 5am daily.
Configure Google Drive - Install the Google Drive app on your Mac. Configure it to sync the export folder.
Create bash script - Copy CSVs to your vault's import directory:

#!/bin/bash
SOURCE="$HOME/Google Drive/My Drive/HealthExports"
DEST="$HOME/Documents/vault/csv-imports/health/Health"
cp "$SOURCE"/*.csv "$DEST/"

Schedule with Apple Shortcuts - Create a shortcut that runs the shell script at 6am daily.

Result: Every morning I wake up with yesterday's health and workout data ready to import with zero effort.

2. Amazon books

Amazon has my entire purchase history for both Kindle and physical books. Getting it should be easy, right?

The official way (slow)

Request order history from Amazon - may take months
Amazon sends email, confirm within 5 days
Amazon sends download link, download within 90 days
Extract the data to discover it is barely usable - no product type, just product name and date

The Chrome extension way (fast)

I built my own Chrome extension while waiting for the first email. It scrapes the purchase history page and exports to CSV.

See: Chrome extensions for installation and usage.

The extension captures:

Field	Source
Product name	Page scrape
Purchase date	Page scrape
Price	Page scrape
Order ID	Page scrape

I still had to remove non-books manually and add a format column (ebook, audiobook, paperback), but at least I had the data.

3. Apple Books

Getting my purchase history from Apple was much easier as it is stored in a SQLite database on my Mac.

Database location

~/Library/Containers/com.apple.iBooksX/Data/Documents/BKLibrary/BKLibrary-1-091020131601.sqlite

The exact filename varies. Look for the .sqlite file in that directory.

Extraction script

#!/bin/bash
DB=$(ls ~/Library/Containers/com.apple.iBooksX/Data/Documents/BKLibrary/*.sqlite | head -1)

sqlite3 -header -csv "$DB" "
SELECT 
  ZTITLE as title,
  ZAUTHOR as author,
  ZGENRE as genre,
  ZASSETID as asset_id,
  datetime(ZCREATIONDATE + 978307200, 'unixepoch') as purchase_date,
  CASE 
    WHEN ZEPUBID IS NOT NULL THEN 'ebook'
    WHEN ZAUDIOBOOKID IS NOT NULL THEN 'audiobook'
    ELSE 'pdf'
  END as format
FROM ZBKLIBRARYASSET
WHERE ZISPURCHASED = 1
ORDER BY purchase_date DESC
" > apple-books-export.csv

This creates a clean CSV with ownership detection (only exports books you own) and automatic type classification.

4. Watch history archaeology

Prime Video and Netflix

For Prime and Netflix, I built Chrome extensions that capture viewing history directly to my vault.

See: Chrome extensions for the direct-to-vault approach.

Emby (self-hosted media server)

Emby was different. Emby does not expose when you watched something. It knows, but does not tell you.

The Emby database lives on my NAS at:

/share/ZFS530_DATA/.qpkg/EmbyServer/programdata/data/activitylog.db

Your path will vary depending on your NAS and installation method.

Emby extraction

Copy the database to your Mac (do not modify in place)
Query the activity log:

SELECT 
  Name as title,
  datetime(DateCreated) as watched_date,
  Type as media_type,
  ItemId as item_id
FROM ActivityLog
WHERE Type IN ('Episode', 'Movie')
ORDER BY DateCreated DESC

Export to CSV:

sqlite3 -header -csv activitylog.db "SELECT ..." > emby-watch-history.csv

I managed to recover almost 5 years of history this way. It took some time but was worth it.

Emby Playback Reporting plugin

For ongoing tracking, I installed the Playback Reporting plugin. It only tracks from the install date, not retrospectively, but it provides a proper API for future exports.

5. Other streaming services

Once you cancel a streaming subscription, there is no way to get your watch history back. This is why I now capture everything as I watch it using the Chrome extensions.

For historical data from cancelled services like Disney+, Paramount, and Binge, I had to rely on:

Email receipts
Bank statements for subscription dates
Memory (unreliable)

Lesson learned: Capture your data while you still have access.

Next: QuickAdd scripts covers how to import the CSV data into Obsidian notes.

Personal Data Hub series​

The problem​

1. Health pipeline​

Step-by-step​

2. Amazon books​

The official way (slow)​

The Chrome extension way (fast)​

3. Apple Books​

Database location​

Extraction script​

4. Watch history archaeology​

Prime Video and Netflix​

Emby (self-hosted media server)​

Emby extraction​

Emby Playback Reporting plugin​

5. Other streaming services​