Skip to main content

Data Engineering

Gathering proprietary data provides a critical competitive advantage in the age of AI.

Certainly! I've extracted a checklist of insights and best practice protocols from the content. Here's a comprehensive list:

Best Practices

Best practices for building a data infrastructure that remains maintainable, and continues to serve business goals effectively long after initial implementation.

Infrastructure Design

  • Start with business goals and outcomes, not tools or technologies
  • Avoid resume-driven or vendor-driven development
  • Be intentional with core design choices and trade-offs
  • Consider long-term maintainability and scalability

Toolkit Selection

Tool and Vendor Selection

  • Talk to people who have used the vendor's tools
  • Read agnostic sources (research papers, non-affiliated consultants, books)
  • Assess tools beyond their initial pitch or POC
  • Be aware of potential limitations that may only become apparent with experience

Collective Wisdom

Documentation and Knowledge Sharing:

  • Create documentation for someone who has never seen the code before
  • Highlight key decision points in documentation
  • Explain why certain tools were chosen and logic implemented
  • Use clear naming conventions and avoid deep abstractions
  • Avoid using acronyms and assumptions in documentation

Alignment

Teamwork and Stakeholder Management

  • Implement cross-training where it makes sense
  • Include semi-technical employees from other teams in knowledge sharing
  • Keep stakeholders in the loop throughout the development process
  • Regularly review dashboards and metrics with business stakeholders
  • Avoid building data infrastructure in isolation from the business

Project Management

  • Be clear about project goals and how they improve the business
  • Take on technical debt intentionally, not haphazardly
  • Regularly deliver reviewable results to stakeholders
  • Simplify complex systems where possible
  • Ensure automated reporting to reduce manual work

Key Person Dependency

Avoid situations where critical knowledge is held by a single person.

  • Distribute knowledge across the team
  • Ensure critical processes are not dependent on a single person
  • Document all manual processes and work towards automation

Legacy

Building Lasting Data Infrastructure

  • Create systems that are easy for future developers to understand
  • Implement clear standards in naming conventions
  • Focus on simplicity and clarity in system design
  • Regularly review and update infrastructure to meet evolving business needs

Capabilities

Timeless skills for Data Engineers and Analysts combine technical expertise with broader system thinking, business acumen, and interpersonal skills to succeed and grow in data engineering and analysis roles.

  1. Thinking in Systems
    • Understand how individual components fit into larger systems
    • Consider the broader impact of changes beyond immediate problem-solving
    • Design robust, maintainable, and adaptable systems
    • Visualize the bigger picture and long-term implications
  2. Data Intuition
    • Develop a deep understanding of how datasets interact and fit together
    • Recognize potential issues and errors in data
    • Understand business workflows and their translation into data
    • Grasp the relationship between business metrics and underlying data models
  3. Team Growth and Collaboration
    • Help elevate the skills of team members
    • Communicate ideas effectively
    • Influence others to make sound decisions
    • Create a collaborative work environment
    • Set standards and mentor colleagues
  4. Continuous Learning and Curiosity
    • Stay updated on new technologies and industry trends
    • Explore both technical and business aspects of the field
    • Engage in side projects or deep dives into research papers
    • Understand the business context and economic factors
  5. Bridging Technical and Business Domains
    • Ability to communicate effectively with both technical and non-technical stakeholders
    • Understand how data solutions impact business outcomes
    • Maintain technical knowledge even when moving into leadership roles
  6. Adaptability and Long-term Thinking
    • Focus on skills that transcend specific technologies
    • Prepare for changes in the tech world by developing foundational competencies
    • Balance immediate problem-solving with long-term system design
  7. Execution and Project Management
    • Lead projects from start to finish
    • Implement robust and maintainable systems regardless of tools used

Roles

Related job titles and contribution expectations.

Flow of Data

Expand ability to capture data and feed valuable insights to AI Agents.

  1. Data Acquisition
  2. Data Warehouse
  3. Data Visualization
  4. Data Science
  5. Data Pipelines
  6. Data Governance

Database Testing

dbt-expectations is a package for dbt that allows you to write tests for your data models using Great Expectations.

Data Caching

Tech