Building a Clean Data Pipeline Off Toast and Square

The Toast or Square account at your restaurant captures more useful operational data in a day than most operators see in a quarter. Every order, every modifier, every void, every comp, every check-by-server, every shift-by-station, every minute-by-minute sales velocity — it is all there, sitting in the POS database, waiting for someone to actually use it.

Most operators don't. They look at the daily sales summary and the weekly mix report. They glance at the labor cost percent. They open the comp report when something feels wrong. The rest of the data — easily 90% of what the POS captures — sits unused, generating no decisions, no diagnostics, no operational improvements.

This post is the data pipeline we build during data integration engagements. The pipeline pulls data out of Toast or Square (or both), lands it in a place you can actually use, and turns the POS from a transactional system into an operational intelligence platform. The work takes 30–60 days. The compounding returns are years of better decisions.

What the POS actually captures

A modern POS records, at minimum:

Every order at the item level, by daypart, by server, by table
Every modifier and substitution
Every comp, void, and re-fire, with reason codes
Time-stamps on order entry, fire, and payment
Server-level performance: cover count, average check, modifier rate, comp rate
Labor data: scheduled vs actual hours, by employee, by shift
Payment method breakdown: cash, card, gift card, third-party delivery, house account
Discount usage by promo code
Reservation-to-walk-in conversion (where reservations are integrated)

That is the minimum. The full data set includes a lot more — POS administrative actions, kitchen display system timing, refund records, training mode entries, manager override logs.

The data exists. The pipeline question is how to get it out, into a tool that can analyze it, in a form that supports daily and weekly operating decisions.

The right architecture

The pipeline has four layers.

Layer 1: Source — the POS itself

Toast and Square both offer:

In-platform reporting: pre-built reports inside the POS interface
CSV exports: manually downloadable reports
APIs: programmatic access for developers

For most operators, the in-platform reporting is what they use. It covers the basics. For everything beyond the basics, the APIs are required.

Layer 2: Extraction

The extraction layer pulls data out of the POS on a regular schedule. Options:

Manual CSV exports: works for small operations with a single user pulling reports weekly. Fragile, error-prone at scale.
Third-party integration tools: services like Stitch, Fivetran, or Airbyte that connect to POS APIs and pull data into a destination. Costs $100–$400/month at small scale.
Custom scripts: developer-built scripts that hit the POS API on a schedule. Costs $2K–$8K for initial build, near-zero ongoing.

For most independent operators with 1–4 locations, third-party integration tools are the right choice. The cost is modest, the technical lift is small, and the data flows reliably.

Layer 3: Storage

The extracted data needs a destination. Options:

Google Sheets: works for small data volumes (one location, summary level). Breaks at higher volumes.
A spreadsheet warehouse like Coefficient or Equals: better than Google Sheets, manageable at small scale.
A data warehouse like BigQuery, Snowflake, or Postgres: the right architecture for multi-location operations or for any operator wanting more than 90 days of detailed history.

For most independents, a small Postgres database or a BigQuery instance is the right tier. Cost: $0–$50/month. Setup: 4–8 hours one-time.

Layer 4: Analysis and reporting

The reporting layer is where the operator actually consumes the data. Options:

Looker Studio (formerly Google Data Studio): free, integrates with BigQuery and Google Sheets, sufficient for most independent operator needs
Tableau or Power BI: more sophisticated, $20–$70/user/month, overkill for most independents
Custom dashboards: built by a developer, $3K–$15K depending on complexity, justifiable for multi-location operators

The reporting layer is where decisions happen. The dashboards should be designed around the questions the operator wants answered weekly, not around what the POS can produce.

The questions the pipeline should answer

The pipeline is worth building only if it answers questions the operator actually wants to act on. The right questions for an independent restaurant:

Daily questions

Yesterday's net sales by daypart compared to forecast and to same-day-last-year
Yesterday's prime cost (labor + COGS) compared to target
Yesterday's comp and void totals, by reason
Any specific server, station, or daypart that materially deviated from expected

Weekly questions

Trailing 7-day mix shift versus trailing 28-day mix (what is selling more, what is selling less)
Trailing 7-day average check by daypart
Trailing 7-day server-level metrics: average check, modifier rate, comp rate
Trailing 7-day labor efficiency: covers per labor hour, sales per labor hour
Trailing 7-day variance to forecast

Monthly questions

Monthly P&L roll-up by line item with variance to budget
Monthly menu engineering analysis (sales and contribution margin by item)
Monthly server-level performance with quarter-over-quarter comparison
Monthly cohort analysis of repeat-visit guests (where reservation data is integrated)
Monthly daypart-by-day-of-week heatmap

If the pipeline cannot answer these questions, it is incomplete. If it answers them well, it transforms how the operator runs the restaurant.

The implementation sequence

The pipeline gets built in 30 days, in five-day blocks.

Days 1–5: Source audit

Inventory exactly what data your current POS is capturing. Toast and Square both expose this through their developer documentation. The audit produces a list of data points available — typically 60–120 distinct fields across orders, payments, employees, menu items, and modifiers.

The audit also identifies gaps. The most common gap is that the POS is not configured to capture certain useful data — modifier reasons, void reasons, comp categories. Fixing the configuration is often a higher-ROI step than building the pipeline.

Days 6–12: Extraction setup

Set up the extraction layer. For most operators, this is configuring a third-party integration tool (Stitch, Fivetran, or similar) to connect to the POS API and pull data on a scheduled basis (typically every 1–4 hours).

The first sync takes 4–12 hours depending on the volume of historical data being backfilled. After the first sync, ongoing syncs are incremental and fast.

Days 13–18: Storage and modeling

Configure the destination (BigQuery, Postgres, or chosen alternative). Build a small set of derived tables on top of the raw POS data:

A daily summary table (one row per location per day)
An hourly sales table (one row per location per hour)
A server performance table (one row per server per day)
A menu item mix table (one row per item per day)
A modifier usage table (one row per modifier per day)

These derived tables are what the reporting layer queries. They abstract the messiness of raw POS data into a clean, query-able structure.

Days 19–25: Reporting layer

Build the reporting dashboards. The minimum useful set:

Daily snapshot: yesterday's headline metrics with comparison to forecast and prior year
Weekly mix report: trailing 7-day item mix with comparison to trailing 28-day
Server scorecard: trailing 30-day server-level metrics
Daypart heatmap: covers and revenue by daypart and day of week

The dashboards live in Looker Studio (or chosen reporting tool) and are accessible to the operator through a web link.

Days 26–30: Calibration and rollout

Run the pipeline for a week, validating that the numbers match what the POS itself reports. Calibrate any discrepancies. Train the operator on accessing and interpreting the dashboards.

By day 30, the operator should be checking the daily snapshot every morning and the weekly mix report every Tuesday.

A data pipeline that takes 30 days to build is a 30-day investment that pays back for the life of the operation. The compounding effect of better daily decisions adds up faster than most operators realize.

What changes after 90 days

Operators who install the pipeline and use the data competently for 90 days see:

Faster operational decisions: weekly adjustments based on data instead of intuition
Better menu management: items that are underperforming get repositioned; items that are over-performing get protected
Server-level performance management: outlier servers (positive and negative) become visible and addressable
Tighter labor scheduling: data-driven schedules match actual demand patterns rather than legacy patterns
Earlier problem detection: anomalies in COGS, labor, or sales show up in the daily snapshot 2–4 weeks before they would have appeared in the monthly P&L

The financial impact varies by how the operator uses the data. Operators who actively work the pipeline typically see 1.5–3 points of margin improvement within 6 months — not because the pipeline produces savings directly, but because the operator's decisions get measurably better with the data in hand.

Common implementation traps

Trap 1: Building it and not using it

The most common failure mode. The pipeline gets built, the dashboards get configured, and then the operator opens them twice a week for a month and then stops. The fix is to make the daily snapshot part of the morning routine — opened every day, ideally with a 5-minute calendar block.

Trap 2: Trying to answer too many questions

A pipeline that produces 47 different reports is a pipeline that nobody uses. The fix is to start with the 4–6 highest-leverage reports and add more only when those are being actively used.

Trap 3: Letting the POS configuration drift

The pipeline is only as good as the POS configuration upstream. If modifiers stop being categorized, the modifier report loses signal. If servers stop entering void reasons, the void analysis becomes meaningless. The fix is a monthly POS configuration audit that confirms the upstream data is still being captured correctly.

Trap 4: No accountability for the data

If the pipeline shows that server A is comping 2.5x more than server B, but nobody talks to server A about it, the pipeline is producing data without producing decisions. The data is the precondition for the conversation; the conversation is what produces improvement.

Multi-location considerations

For multi-location operators, the pipeline architecture is similar but the data layer carries an additional dimension (location_id) and the reporting layer typically includes cross-location comparison views.

The most useful cross-location views:

Comparable daypart performance: location A vs location B vs location C on Saturday brunch
Comparable mix shift: what's selling at each location, where the menu engineering opportunities are
Comparable server metrics: identifying outliers across the group
Comparable operational metrics: which location has the highest comp rate, which has the highest void rate

The cross-location data also supports the unit economics gate-checks framework for any operator considering opening another location.

When the pipeline is the right project

Three signals.

Signal 1: You currently have visibility into daily sales and not much else. The pipeline produces visibility into 10x more operational data without 10x more work.

Signal 2: You operate 2 or more locations and cannot easily compare them. The pipeline makes comparison trivial.

Signal 3: You are planning a major operational change — menu redesign, labor model change, expansion — and want a data foundation to measure the change against. The pipeline produces the baseline.

When it isn't

Two cases.

Case 1: Your POS data is unreliable or inconsistent. If menu items are not categorized correctly, if modifiers are entered inconsistently, if servers don't use the system as intended, the pipeline propagates the garbage. Fix the upstream data discipline first.

Case 2: Your operating fundamentals are off and you don't have time to use more data. Better to fix the operating issues first and add the pipeline once you have the bandwidth to consume what it produces.

Getting started

Three steps in the next 30 days.

Step 1: Audit your current POS configuration. Are voids categorized? Are comps categorized? Are modifiers consistently entered? Fix the configuration gaps first.

Step 2: Sign up for a third-party integration tool (Stitch, Fivetran, or Airbyte) and connect it to your POS. Even before building reports, get the data flowing into a destination.

Step 3: Build the daily snapshot and weekly mix report first. Use them for two weeks before adding more reports.

By month 2, the pipeline is producing actionable data. By month 4, the operating rhythm has changed to incorporate the pipeline.

If you want help with the pipeline build or want a second set of eyes on the architecture for your specific POS and operation size, book a discovery call. Bring a description of your current POS, your number of locations, and the questions you most want answered. We will walk through the architecture on the call and tell you which layer to build first.

The POS data is the single most under-leveraged operational asset in independent restaurants. The pipeline that exposes it is one of the highest-ROI technical investments an operator can make.