Data Operations · Nonprofit CRM

Three sources.
One campaign-ready file.

Simulates the multi-source donor data normalization workflow used before a Salesforce Data Loader import — field mapping, format standardization, duplicate resolution, and validated output with a full audit log.

Salesforce Contacts export
Email platform export
Event sign-in spreadsheet
1
Ingest source files
3 sources · waiting
2
Map and normalize fields
waiting
Source fieldSourceCanonical field
3
Validate and correct quality issues
waiting
4
Detect and resolve duplicates
waiting
5
Generate output and audit log
waiting

Pipeline complete — output ready

39 unique records · campaign-ready · Salesforce Data Loader compatible

About this project

What this demonstrates — and why it was built this way

Before any donor data can be loaded into a CRM — Salesforce, Microsoft Dynamics, Blackbaud, or otherwise — it has to be cleaned. Every nonprofit technology implementation runs this gauntlet: data arrives from three or four sources, each with different field names, different formatting conventions, and duplicate records for the same person across systems. Someone has to normalize it, resolve the duplicates, document every decision, and produce a validated output file the import tool will actually accept.

This simulator models that workflow end to end. The pipeline isn't decorative — the JavaScript is actually executing each step against the source data: detecting that "Massachusetts" should be "MA," recognizing that James Patel appears in all three source systems under the same email address and needs to be merged into one master record, flagging Betty Taylor's phone number as invalid rather than silently passing it through. The output CSV it produces is real and downloads correctly. The audit log timestamps every transformation.

Design decisions worth explaining
Why three specific source systems
The combination of CRM export, email platform, and event sign-in sheet is intentional — not arbitrary. These are the three most common sources a nonprofit data team encounters when a new client brings their data for a CRM implementation or migration. Each one has a different schema, a different level of data discipline, and a different set of quality problems. The CRM export has NPSP-specific field names (npo02__TotalOppAmount__c) that reflect real Salesforce NPSP vocabulary. The email platform export uses abbreviated field names (fname, state_abbr). The event sign-in sheet uses plain English headers with spaces — the kind of thing a volunteer typed into Google Sheets. Handling all three in one pipeline is the actual job.
Why the audit log matters
The animated pipeline is the demo. The audit log is the signal. In production data work, every transformation has to be documented — not because it looks good, but because when a client asks "why does this record show MA instead of Massachusetts?" or "why did we end up with 39 records instead of 47?" there has to be an answer. The audit log here captures what changed, why, and which source record was treated as the master in each merge. That accountability layer is what distinguishes a data operations professional from someone who just ran a script and hoped for the best.
The merge logic is a deliberate choice
When James Patel appears in all three sources, the pipeline doesn't just pick one record and discard the rest — it retains the CRM record as the master (because it has giving history, which is the most valuable field) and merges the engagement score from the email platform and the interest level from the event sign-in into that master record. That's best-value field selection: the output record is richer than any single source. The Patricia Harris case surfaces a name variant — "Pat" in the email platform vs. "Patricia" in the CRM — and the pipeline resolves it in favor of the formal name. These aren't accidents; they're documented decisions.
What this is and isn't
This is a working model of how I think about data operations problems, built in JavaScript so it runs in a browser without infrastructure. It isn't production ETL code, a Salesforce implementation, or a claim that this is how I'd engineer a real pipeline at scale — that would use Python, Pandas, a proper dedup library, and a validated import process with error row review. What it demonstrates is the underlying methodology: define the canonical schema first, map sources to it explicitly, treat every quality issue as a documented decision rather than a silent correction, and produce output your downstream system can trust. That approach doesn't change whether the tool is written in FoxPro, Python, or JavaScript.
Operations covered by the pipeline
🗂
Multi-source ingestion
CRM, email platform, and event systems — each with a different schema and field naming convention
🔤
Field mapping
12 source field variants resolved to a single canonical donor schema across all three sources
🔧
Format standardization
Phone numbers, state abbreviations, and engagement score normalization — with counts and examples
🔍
Duplicate resolution
Email-based cross-source matching with best-value field selection and documented merge rationale
📋
Audit trail
Timestamped log of every transformation, flag, and merge decision — downloadable as a text file
📤
Salesforce-ready output
Output schema matches NPSP Contact import format — compatible with Salesforce Data Loader upsert
Built by Josh Maynard  ·  joshmaynard.dev