Skip to content

Data Pipeline

This example shows how to build a data pipeline using vgrid convert and vgrid calc — filtering rows, selecting columns, and computing aggregates without leaving the terminal.

transactions.csv — a bank export:

Date,Description,Amount,Category,Status
2025-01-02,Acme Corp Invoice,$1200.00,Revenue,Cleared
2025-01-03,Office Supplies,-$85.50,Expenses,Cleared
2025-01-05,Google Workspace,-$14.40,Expenses,Pending
2025-01-06,Client Payment,$3500.00,Revenue,Cleared
2025-01-07,AWS Monthly,-$247.80,Expenses,Pending
2025-01-08,Refund - Acme Corp,-$200.00,Revenue,Pending
Terminal window
# All pending transactions
vgrid convert transactions.csv -t csv --headers --where 'Status=Pending'
# Pending expenses only (multiple filters combine as AND)
vgrid convert transactions.csv -t csv --headers \
--where 'Status=Pending' --where 'Amount<0'
# Search descriptions (case-insensitive substring)
vgrid convert transactions.csv -t csv --headers \
--where 'Description~"acme"'

Numeric comparisons automatically handle financial formatting — $1,200.00 is parsed as 1200.00.

Terminal window
# Pick specific columns
vgrid convert transactions.csv -t csv --headers \
--select 'Date,Description,Amount'
# Filter and select (--where runs before --select)
vgrid convert transactions.csv -t csv --headers \
--where 'Category=Expenses' --select 'Description,Amount'

--where can reference columns not in --select — filtering happens before projection.

Pipe filtered output into calc for totals:

Terminal window
# Total pending charges
vgrid convert transactions.csv -t csv --headers \
--where 'Status=Pending' --select Amount | \
vgrid calc '=SUM(A:A)' -f csv
# Count of expense transactions
vgrid convert transactions.csv -t csv --headers \
--where 'Category=Expenses' | \
vgrid calc '=COUNTA(A:A)-1' -f csv --headers
# Average revenue amount
vgrid convert transactions.csv -t csv --headers \
--where 'Category=Revenue' --select Amount | \
vgrid calc '=AVERAGE(A:A)' -f csv
Terminal window
# Pending transactions as JSON objects (headers become keys)
vgrid convert transactions.csv -t json --headers --where 'Status=Pending'
[
{"Date":"2025-01-05","Description":"Google Workspace","Amount":"-$14.40","Category":"Expenses","Status":"Pending"},
{"Date":"2025-01-07","Description":"AWS Monthly","Amount":"-$247.80","Category":"Expenses","Status":"Pending"},
{"Date":"2025-01-08","Description":"Refund - Acme Corp","Amount":"-$200.00","Category":"Revenue","Status":"Pending"}
]

Combine tools for a full workflow:

Terminal window
# 1. Extract pending expenses from XLSX
vgrid convert report.xlsx -t csv --headers \
--sheet "Transactions" \
--where 'Status=Pending' --where 'Amount<0' \
--select 'Date,Description,Amount' \
-o pending_expenses.csv
# 2. Compute the total
TOTAL=$(vgrid convert pending_expenses.csv -t csv --headers \
--select Amount | vgrid calc '=SUM(A:A)' -f csv)
echo "Total pending expenses: $TOTAL"
# 3. Diff against expected values
vgrid diff pending_expenses.csv expected.csv \
--key Description --tolerance 0.01 --no-fail --out json
TaskCommand
Filter rowsconvert --where 'Col=Value'
Select columnsconvert --select 'A,B,C'
Sum a columncalc '=SUM(A:A)' -f csv
Count rowscalc '=COUNTA(A:A)' -f csv
Averagecalc '=AVERAGE(A:A)' -f csv
Conditional sumcalc '=SUMIF(B:B,">0",A:A)' -f csv
Convert formatconvert input.xlsx -t json --headers