Data pipelines
This page covers how to get your scattered data out of various services and into CSV format ready for import into Obsidian.
Personal Data Hub series
- Personal Data Hub
- Foundations
- Data pipelines - You are here
- QuickAdd scripts
- Chrome extensions
- Plugins
The problem
My data was scattered everywhere. Health metrics locked in Apple Health. Purchased books across Amazon and Apple. Watch history spread across eight streaming services. Once you cancel a service your watch history is gone until you pay again.
Your data belongs to you. Getting it back often requires detective work, custom tools, and digging into databases you were never meant to see.
1. Health pipeline
Apple does not provide an easy way to export health data to CSV. I use an app called HealthFit as a bridge and added Google Sheets export.
Step-by-step
-
Configure HealthFit - Add Google Sheets as an export destination. This creates a Health Metrics spreadsheet with tabs for daily metrics, weight, sleep, and mindfulness.
-
Create Google Script - Add a daily export script that extracts each tab to CSV:
function exportHealthData() {
const ss = SpreadsheetApp.getActiveSpreadsheet();
const folder = DriveApp.getFolderById('YOUR_FOLDER_ID');
const sheets = ['Metrics', 'Sleep', 'Weight', 'Mindfulness'];
sheets.forEach(name => {
const sheet = ss.getSheetByName(name);
const csv = convertToCSV(sheet);
folder.createFile(name.toLowerCase() + '.csv', csv, MimeType.CSV);
});
}
function convertToCSV(sheet) {
const data = sheet.getDataRange().getValues();
return data.map(row => row.join(',')).join('\n');
}
-
Add trigger - Set a time-driven trigger for 5am daily.
-
Configure Google Drive - Install the Google Drive app on your Mac. Configure it to sync the export folder.
-
Create bash script - Copy CSVs to your vault's import directory:
#!/bin/bash
SOURCE="$HOME/Google Drive/My Drive/HealthExports"
DEST="$HOME/Documents/vault/csv-imports/health/Health"
cp "$SOURCE"/*.csv "$DEST/"
- Schedule with Apple Shortcuts - Create a shortcut that runs the shell script at 6am daily.
Result: Every morning I wake up with yesterday's health and workout data ready to import with zero effort.
2. Amazon books
Amazon has my entire purchase history for both Kindle and physical books. Getting it should be easy, right?
The official way (slow)
- Request order history from Amazon - may take months
- Amazon sends email, confirm within 5 days
- Amazon sends download link, download within 90 days
- Extract the data to discover it is barely usable - no product type, just product name and date
The Chrome extension way (fast)
I built my own Chrome extension while waiting for the first email. It scrapes the purchase history page and exports to CSV.
See: Chrome extensions for installation and usage.
The extension captures:
| Field | Source |
|---|---|
| Product name | Page scrape |
| Purchase date | Page scrape |
| Price | Page scrape |
| Order ID | Page scrape |
I still had to remove non-books manually and add a format column (ebook, audiobook, paperback), but at least I had the data.
3. Apple Books
Getting my purchase history from Apple was much easier as it is stored in a SQLite database on my Mac.
Database location
~/Library/Containers/com.apple.iBooksX/Data/Documents/BKLibrary/BKLibrary-1-091020131601.sqlite
The exact filename varies. Look for the .sqlite file in that directory.
Extraction script
#!/bin/bash
DB=$(ls ~/Library/Containers/com.apple.iBooksX/Data/Documents/BKLibrary/*.sqlite | head -1)
sqlite3 -header -csv "$DB" "
SELECT
ZTITLE as title,
ZAUTHOR as author,
ZGENRE as genre,
ZASSETID as asset_id,
datetime(ZCREATIONDATE + 978307200, 'unixepoch') as purchase_date,
CASE
WHEN ZEPUBID IS NOT NULL THEN 'ebook'
WHEN ZAUDIOBOOKID IS NOT NULL THEN 'audiobook'
ELSE 'pdf'
END as format
FROM ZBKLIBRARYASSET
WHERE ZISPURCHASED = 1
ORDER BY purchase_date DESC
" > apple-books-export.csv
This creates a clean CSV with ownership detection (only exports books you own) and automatic type classification.
4. Watch history archaeology
Prime Video and Netflix
For Prime and Netflix, I built Chrome extensions that capture viewing history directly to my vault.
See: Chrome extensions for the direct-to-vault approach.
Emby (self-hosted media server)
Emby was different. Emby does not expose when you watched something. It knows, but does not tell you.
The Emby database lives on my NAS at:
/share/ZFS530_DATA/.qpkg/EmbyServer/programdata/data/activitylog.db
Your path will vary depending on your NAS and installation method.
Emby extraction
-
Copy the database to your Mac (do not modify in place)
-
Query the activity log:
SELECT
Name as title,
datetime(DateCreated) as watched_date,
Type as media_type,
ItemId as item_id
FROM ActivityLog
WHERE Type IN ('Episode', 'Movie')
ORDER BY DateCreated DESC
- Export to CSV:
sqlite3 -header -csv activitylog.db "SELECT ..." > emby-watch-history.csv
I managed to recover almost 5 years of history this way. It took some time but was worth it.
Emby Playback Reporting plugin
For ongoing tracking, I installed the Playback Reporting plugin. It only tracks from the install date, not retrospectively, but it provides a proper API for future exports.
5. Other streaming services
Once you cancel a streaming subscription, there is no way to get your watch history back. This is why I now capture everything as I watch it using the Chrome extensions.
For historical data from cancelled services like Disney+, Paramount, and Binge, I had to rely on:
- Email receipts
- Bank statements for subscription dates
- Memory (unreliable)
Lesson learned: Capture your data while you still have access.
Next: QuickAdd scripts covers how to import the CSV data into Obsidian notes.