Data Footprint

All we are is state machine that are impacted by data flow.

What routine checks and balances needs to be setup to manage this process optimally?

Subject Expertise

Data is the new gold.

Data governance: The oversight to ensure that data brings value and supports your business strategy.
Interoperability and usability: The ability to interact with multiple user profiles and other systems.
Integrity: All operations processes that keep the data warehouse running in production.
Security, privacy, compliance: Protect the data from threats.
Reliability: The ability of a system to recover from failures and continue to function.
Performance efficiency: The ability of a system to adapt to changes in load.
Cost optimization: Reduce costs while maximizing the value delivered.
ETL
Standards Compliance

tip

Intelligence you gain is only a good as your data is true

Data Business Rules

What are you legally required to do before collecting personal information?

You're in "real privacy" territory as soon as you collect names and contact details in NZ. LinkedIn also expects a public privacy policy URL before users authorise your app.

NZ Privacy Act 2020

Any NZ business or solo operator collecting personal information is an "agency" under the Privacy Act 2020. Thirteen privacy principles apply.

Principle	Rule	Sales Dev Impact
Purpose	State why you collect it	Sales outreach, CRM logging, campaign tracking
Necessity	Only collect what's reasonably needed	Name, email, LinkedIn URL — not everything available
Security	Store securely	Supabase (encrypted at rest), RBAC access controls
Disclosure	Name who you share it with	Resend, CRM, hosting providers, analytics
Access	Let people see and correct their data	Provide a mechanism on request
Deletion	Delete on request	Honour within reasonable time

LinkedIn API

LinkedIn requires a public privacy policy URL before users authorise your app. Your policy must be at least as strong as LinkedIn's.

Requirement	What's Needed
Privacy Policy URL	Publicly accessible, linked in app config
Data handling	Explain what member data you store and why
Consent	Show policy to users before they grant access
Revocation	Delete data when consent is revoked
Storage limits	Follow LinkedIn's limits on member data retention

Privacy Policy

Host at a stable URL (/privacy on your domain or public doc). Reference from LinkedIn app config and product website. Guide for NZ businesses.

Section	Content
What you collect	Names, emails, LinkedIn profile URLs, messages, event logs
Why you collect it	Sales pipeline, campaign tracking, AI-assisted outreach
Where it's stored	Supabase, Vercel, Resend, LinkedIn API, analytics
Who processes it	Third-party services with data access
Retention	How long per data type, deletion on request
Contact	How to access, correct, or delete

Commissioning Gate

No Sales Dev Agent feature that touches personal data ships without:

Privacy policy hosted at public URL
LinkedIn app config references that URL
Data collection limited to stated purposes
Deletion pathway tested end-to-end
Third-party processors named in policy

Playbook

Here's an optimized framework for AI-driven business data analysis operations, synthesized from industry best practices and emerging agent architectures:

Role	Job → Workflow	Prompt Example	Context	Best Fit Model	Essential Tools
Data Collector	Automate data ingestion → ETL pipelines	"Act as an ETL pipeline architect. Identify all relevant data sources in [department], establish automated ingestion via APIs/web scraping, and structure into [format] for analysis. Document schema relationships."	Cross-system data integration, real-time streaming	GPT-4 Turbo + CodeLlama-70B	Python (BeautifulSoup/Scrapy), Airflow, Fivetran, Snowflake, AWS Glue
Quality Sentinel	Clean/validate datasets → Anomaly detection	"Act as data integrity agent. Analyze [dataset] for missing values, outliers, and formatting errors. Implement automated correction rules while maintaining audit trail. Generate data health report."	Regulatory compliance, ML model readiness	Claude 3 Sonnet	Great Expectations, Pandera, Trifacta, Dataiku
Insight Miner	Exploratory analysis → Hypothesis testing	"Act as principal analyst. Using [dataset], test whether [business hypothesis] holds. Apply appropriate statistical methods (p < 0.05). Visualize relationships between [variables]. Suggest 3 actionable next steps."	Strategic decision support, market trend analysis	GPT-4 Omni + Wolfram	SQL, R/Python statsmodels, Jupyter, Tableau
Forecast Architect	Predictive modeling → Scenario planning	"Act as quantitative modeler. Develop ARIMA/prophet model for [metric] forecasting. Backtest against last 5 years data. Show 3 scenarios (+/- 15% variance). Output interactive visualizations with confidence intervals."	Financial planning, inventory optimization	Google Gemini Advanced	Prophet, TensorFlow, Azure ML, Power BI
Storyteller	Insight synthesis → Executive reporting	"Act as VP Strategy. Transform [analysis findings] into C-suite presentation: 5 slides max. Highlight 3 key opportunities, 2 risks, and 1 recommended initiative with ROI projection. Use automotive industry benchmarks."	Board-level communication, investor relations	Claude 3 Opus	Think-Cell, PowerPoint, Domo, Looker
Process Optimizer	Continuous improvement → AutoML	"Act as ML Ops engineer. Implement automated feature engineering on [dataset]. Deploy best-performing model from H2O AutoML. Monitor drift with [threshold]. Create retraining pipeline in AWS SageMaker."	Operational efficiency, real-time decision systems	Mistral-Large + SAS Viya	H2O.ai, SageMaker, MLflow, Kubeflow
Compliance Guard	GDPR/PII protection → Audit trails	"Act as data governance officer. Anonymize [sensitive fields] using k-anonymity (k=5). Implement role-based access controls. Generate compliance certificate with embedded watermarks for audit purposes."	Regulatory requirements, ethical AI practices	IBM Watsonx	Immuta, Privitar, Collibra, Azure Purview

Platform

Toolkit and tech stack to maximize gains from your data footprint.

Agent Stack Architecture: Layer LangChain/Microsoft AutoGen for orchestration between specialized models
Tool Integration: Use vector databases (Pinecone/Weaviate) for contextual memory across workflows
Validation: Implement human-in-the-loop review points pre-execution of critical business decisions
Security: Zero-trust design with encrypted data vaults (Vault12/Anjuna) for sensitive financial data

This framework enables autonomous operation while maintaining necessary governance controls, combining the analytical depth of tools like Databricks with the conversational interface of LAMBDA agents. For maximum ROI, prioritize implementations that bridge departmental silos through unified data lakes on Snowflake/Redshift.

AI Strategy Checklist

Evolve a comprehensive Data and AI strategy checklist that addresses all aspects of data management across SaaS products, internal systems, and manual processes like spreadsheets. This checklist is designed to be valuable for any business owner or director regardless of technical background.

Data Inventory and Ecosystem Mapping

What is the complete inventory of your data sources? (SaaS platforms, internal databases, spreadsheets, documents, etc.)
Which systems contain your most business-critical data?
For each data source, who "owns" the data and is responsible for its accuracy?
What types of data do you collect and store? (customer, financial, operational, etc.)
Which systems serve as your "source of truth" for different types of reference data?
How much of your critical business data resides in unstructured formats (emails, documents) or personal tools (spreadsheets)?
Are there any "shadow IT" systems or unauthorized data repositories in use?

Data Quality and Governance

How do you measure and ensure data quality across all systems?
What processes exist for data validation and cleaning?
How do you handle incomplete or inaccurate data?
What governance policies are in place regarding data access, usage, and modification?
How do you comply with relevant data regulations (GDPR, CCPA, industry-specific)?
Do you have a data dictionary or catalog that defines key data elements across systems?
How do you track the lineage of data as it moves between systems?

Data Integration and Flow

How is data currently shared or synchronized between your various systems?
What integration challenges exist between your SaaS platforms and internal systems?
Are you using any middleware or iPaaS solutions (Zapier, MuleSoft, etc.) to connect systems?
How much manual effort is required to move data between systems?
What are your biggest pain points regarding data duplication or inconsistency?
Do you have real-time data needs, and if so, are they being met?
How do reference data updates propagate across your ecosystem?

Data Access and Utilization

Who has access to what data, and how is this access controlled?
What tools do different stakeholders use to access and analyze data?
How long does it typically take to gather the data needed for key decisions?
Are there bottlenecks in accessing or retrieving data when needed?
Do business users have self-service analytics capabilities, or are they dependent on IT?
What skills gaps exist in your organization regarding data analysis?

Reporting and Analytics Capabilities

What reports are regularly generated from your data, and how are they created?
How much time is spent preparing reports versus analyzing them?
Are your current analytics capabilities descriptive, diagnostic, predictive, or prescriptive?
What visualization tools are being used (Tableau, Power BI, custom dashboards)?
How frequently is data refreshed in your reports and dashboards?
What metrics or KPIs do you track most closely?
Are there any insights you wish you could extract but currently cannot?

Decision-Making Processes

Which decisions are currently data-driven versus intuition-based?
How quickly can you act on insights derived from your data?
What is the typical lag time between data collection and decision-making?
How do you measure the effectiveness of data-driven decisions?
Are there recurring decisions that could be automated?
What operational inefficiencies could be addressed through better data usage?

AI Readiness and Opportunities

What specific business problems are you trying to solve that AI could potentially address?
Have you identified high-value use cases for AI implementation?
Is your data of sufficient quality and quantity to support AI initiatives?
What experiments or proofs of concept have you conducted with AI?
Do you have clear objectives and measurable outcomes for potential AI implementations?
Which areas of your business would benefit most from predictive analytics?
Are there repetitive cognitive tasks that could be augmented or automated with AI?

Implementation and Resource Considerations

What is your current technology infrastructure's ability to support advanced analytics or AI?
Do you have the necessary in-house expertise, or will you need external support?
What is your budget allocation for data and AI initiatives?
How would you prioritize potential data and AI projects?
What change management considerations should be addressed?
How will you measure ROI on data and AI investments?
What timeline would be realistic for implementing key initiatives?

Strategy and Future Planning

What is your long-term vision for leveraging data and AI?
How does your data strategy align with your overall business strategy?
What competitive advantages could be gained through better data utilization?
How do you plan to scale your data capabilities as your business grows?
What emerging technologies or approaches should be on your radar?
How will you continuously improve your data ecosystem?
What skills development is needed to support your future data and AI initiatives?

The Commissioning Instrument

The data footprint is not an audit. It's a commissioning instrument — a mechanical check that reveals the gap between what's built and what's connected.

Every table, every API route, every UI entry point has a maturity state:

State	What It Means	Signal
Schema exists	Structure is defined	Drawings done
Data exists	Records populated	Material procured
API exists	Programmatic access	Wired
UI exists	Human can reach it	Controls proven
Feedback loop closes	Usage improves the system	Operating

A system can score perfectly on schema and data while having zero UI or API entry points. Structure without connection is inert. The data footprint reveals this gap mechanically — not through judgment, but through counting entry points.

The question: for each table, can a human reach it? Can an agent use it? If neither, the capability exists in theory only.

The State Machine Score

165 tables. 169 schema files. 71 repository implementations. This is the current footprint.

State	Name	Evidence Mechanism	Tables at State
0	Idea	Concept named in schema file	165
1	Schema	Table in `information_schema`	165
2	Migration	Table exists in DB	165
3	Data	`record_count > 0` in `meta_database_introspection`	~12 (not yet measured)
4	Repository	Repo class exists in `repositories/src/lib/`	71
5	Server Actions	API route responds	Not yet measured
6	CRUD UI	`has_crud_interface = true` in `meta_table_documentation`	7
7	ETL Pipeline	`pipeline_in` populated in `meta_table_documentation`	0
8	A2A API	`has_agent_interface = true` in `meta_table_documentation`	3
9	E2E Tests	Intent/E2E test pass in CI	Not yet measured
10	Commissioned	All prior states verified	0

The gap: meta_table_documentation has 0 rows. The instrument exists but has never been seeded. Until it is seeded, states 6, 7, and 8 cannot be read deterministically.

Domain Breakdown

Domain	Tables	Has Repo	CRUD UI	A2A API	State 3 (data)
agent	32	~20	2	2	agent_profiles: 6, deals: 3, contacts: 5
std	22	~6	0	0	0
priority	13	5	0	0	problems: 1
platform	9	4	1	0	0
planning	9	1	0	0	plans: 26, tasks: 150, events: 216
tech	9	0	0	0	0
value	8	1	0	0	0
venture	8	3	1	0	0
job	7	0	0	0	0
governance	4	1	0	0	permissions: 70, role_perms: 86
knowledge	3	0	0	0	0
other	32	~10	3	1	org: 1, system_user: 1

The Four Wiring Gaps

To achieve 100% deterministic commissioning:

#	Gap	File to Wire	What "Done" Looks Like
1	`meta_table_documentation` empty	`tools/scripts/etl/database-introspection/run.ts`	One row per table, auto-seeded from `information_schema`
2	`meta_database_introspection` never run	Same script	Row counts, column counts, FK graph populated for all 165 tables
3	CRUD + API detection not writing to DB	`tools/scripts/analysis/commissioning-detection/detect-crud-pages.ts` + `detect-api-endpoints.ts`	`hasCrudInterface` and `hasAgentInterface` flags written to `meta_table_documentation`
4	Dashboard not querying the instrument	`GET /api/commissioning/status`	Endpoint reads `meta_table_documentation` — `commisioning-status.md` becomes a view, not a manual file

The deterministic query (once seeded):

SELECT
  table_name,
  has_crud_interface,
  has_agent_interface,
  pipeline_in IS NOT NULL as has_etl,
  record_count > 0 as has_data,
  meta_score
FROM meta_table_documentation
ORDER BY meta_score DESC, table_name;

This single query answers: data footprint, human access, agent access, ETL state — for every table. No manual interpretation required.

Context

Platform — The accumulated assets the footprint measures
Legal Operations — Broader legal framework this page's business rules sit within
Sales Dev Agent — First PRD gated on privacy compliance
Identity & Access — Auth and RBAC that enforces data access controls
Work Charts — Who does what, and receipts that prove it
Flow Engineering — The maps that produce this instrument
Commissioning — Where maturity states are tracked
Standard KPI Metrics
Decision Making

Questions

If structure without connection is inert, how many of your tables exist in theory only — and what does that reveal about what you've actually built?

Which data domain in your business has the widest gap between schema maturity and feedback loop maturity?
When the commissioning instrument shows a table with data but no API, is that a build priority or an intentional boundary?
Does having a privacy policy change what you collect — or does it just document what you were already doing?
What breaks first when a user requests deletion and your data flows span five services?

Subject Expertise​

Data Business Rules​

NZ Privacy Act 2020​

LinkedIn API​

Privacy Policy​

Commissioning Gate​

Playbook​

Platform​

AI Strategy Checklist​

Data Inventory and Ecosystem Mapping​

Data Quality and Governance​

Data Integration and Flow​

Data Access and Utilization​

Reporting and Analytics Capabilities​

Decision-Making Processes​

AI Readiness and Opportunities​

Implementation and Resource Considerations​

Strategy and Future Planning​

The Commissioning Instrument​

The State Machine Score​

Domain Breakdown​

The Four Wiring Gaps​

Context​

Links​

Questions​