Data Pipeline
This example shows how to build a data pipeline using vgrid convert and vgrid calc — filtering rows, selecting columns, and computing aggregates without leaving the terminal.
The data
Section titled “The data”transactions.csv — a bank export:
Date,Description,Amount,Category,Status2025-01-02,Acme Corp Invoice,$1200.00,Revenue,Cleared2025-01-03,Office Supplies,-$85.50,Expenses,Cleared2025-01-05,Google Workspace,-$14.40,Expenses,Pending2025-01-06,Client Payment,$3500.00,Revenue,Cleared2025-01-07,AWS Monthly,-$247.80,Expenses,Pending2025-01-08,Refund - Acme Corp,-$200.00,Revenue,PendingFilter rows
Section titled “Filter rows”# All pending transactionsvgrid convert transactions.csv -t csv --headers --where 'Status=Pending'
# Pending expenses only (multiple filters combine as AND)vgrid convert transactions.csv -t csv --headers \ --where 'Status=Pending' --where 'Amount<0'
# Search descriptions (case-insensitive substring)vgrid convert transactions.csv -t csv --headers \ --where 'Description~"acme"'Numeric comparisons automatically handle financial formatting — $1,200.00 is parsed as 1200.00.
Select and reorder columns
Section titled “Select and reorder columns”# Pick specific columnsvgrid convert transactions.csv -t csv --headers \ --select 'Date,Description,Amount'
# Filter and select (--where runs before --select)vgrid convert transactions.csv -t csv --headers \ --where 'Category=Expenses' --select 'Description,Amount'--where can reference columns not in --select — filtering happens before projection.
Aggregate with calc
Section titled “Aggregate with calc”Pipe filtered output into calc for totals:
# Total pending chargesvgrid convert transactions.csv -t csv --headers \ --where 'Status=Pending' --select Amount | \ vgrid calc '=SUM(A:A)' -f csv
# Count of expense transactionsvgrid convert transactions.csv -t csv --headers \ --where 'Category=Expenses' | \ vgrid calc '=COUNTA(A:A)-1' -f csv --headers
# Average revenue amountvgrid convert transactions.csv -t csv --headers \ --where 'Category=Revenue' --select Amount | \ vgrid calc '=AVERAGE(A:A)' -f csvJSON output for scripts
Section titled “JSON output for scripts”# Pending transactions as JSON objects (headers become keys)vgrid convert transactions.csv -t json --headers --where 'Status=Pending'[ {"Date":"2025-01-05","Description":"Google Workspace","Amount":"-$14.40","Category":"Expenses","Status":"Pending"}, {"Date":"2025-01-07","Description":"AWS Monthly","Amount":"-$247.80","Category":"Expenses","Status":"Pending"}, {"Date":"2025-01-08","Description":"Refund - Acme Corp","Amount":"-$200.00","Category":"Revenue","Status":"Pending"}]Multi-step pipeline
Section titled “Multi-step pipeline”Combine tools for a full workflow:
# 1. Extract pending expenses from XLSXvgrid convert report.xlsx -t csv --headers \ --sheet "Transactions" \ --where 'Status=Pending' --where 'Amount<0' \ --select 'Date,Description,Amount' \ -o pending_expenses.csv
# 2. Compute the totalTOTAL=$(vgrid convert pending_expenses.csv -t csv --headers \ --select Amount | vgrid calc '=SUM(A:A)' -f csv)echo "Total pending expenses: $TOTAL"
# 3. Diff against expected valuesvgrid diff pending_expenses.csv expected.csv \ --key Description --tolerance 0.01 --no-fail --out jsonQuick reference
Section titled “Quick reference”| Task | Command |
|---|---|
| Filter rows | convert --where 'Col=Value' |
| Select columns | convert --select 'A,B,C' |
| Sum a column | calc '=SUM(A:A)' -f csv |
| Count rows | calc '=COUNTA(A:A)' -f csv |
| Average | calc '=AVERAGE(A:A)' -f csv |
| Conditional sum | calc '=SUMIF(B:B,">0",A:A)' -f csv |
| Convert format | convert input.xlsx -t json --headers |