Why Being a Data Engineer Makes You a Better Early-Stage Investor
The Plumber’s Edge
I started my career on Park Avenue before moving downtown to Wall Street.
I wore the suit. I built models. I lived in Excel, grinding out discounted cash flows and leveraged buyout scenarios for companies that had been around for decades. It was a world of knowable metrics. We had historical data, P&L statements, and balance sheets. Our job was to interpret that past to project a plausible future.
We were financial storytellers, armed with spreadsheets.
Then I saw the wave coming.
It wasn’t just “tech” as a sector. It was the rise of machine learning and the realization that data, not code, was becoming the world’s most valuable asset. The models I was building felt archaic. They were based on human intuition and quarterly reports, while new models were emerging that could ingest the world’s information in real-time and find patterns no human analyst ever could.
I knew I couldn’t just analyze this new world. I had to build in it.
So I left. I taught myself to code. I dove into Python, SQL, and cloud architecture. My “capstone project” wasn’t a simple web app. It was an institutional-grade data pipeline to scrape, parse, and structure every single filing from the SEC’s EDGAR database. I was building the system I wished I’d had as an analyst. My goal was to feed this data into machine learning models to run a hedge fund.
The project nearly broke me. It wasn’t the ML models that were hard. The models were the easy part.
The hard part was the data.
All my fellow brothers and sisters in the fight know what I mean.
I battled with unstructured text, evolving formats, and a firehose of information that would crash my servers. I learned about data ingestion, cleaning, normalization, schemas, and distributed processing. I learned about the unglamorous, infuriating, and utterly critical work of data engineering.
I became a digital plumber.
At the same time, the world was filling with more and more pipes.
When I eventually returned to investing, I didn’t go back to public markets or Wall Street. I was drawn to the chaos and promise of seed and Series A startups. And I found, to my surprise, that my new skills as a data engineer were infinitely more valuable than my old ones as an investment banker.
Being a data engineer makes you a fundamentally better early-stage investor.
It gives you x-ray vision that finance-trained investors simply do not possess.
They see the pitch deck.
I see the plumbing.
Beyond the Total Addressable Market
At the seed and Series A stage, traditional financial analysis is useless. There are no historicals. There’s no EBITDA to project. There is only a team, a story, and perhaps the faintest glimmer of traction.
The traditional venture investor, even one from a finance background, is forced to evaluate the story. They ask:
How big is the Total Addressable Market?
How charismatic is the founder?
What’s the go-to-market strategy?
What’s the 5-year revenue projection?
These are, frankly, guesses layered on top of ambitions. The 5-year projection is a work of fiction. The TAM slide is an exercise in drawing laughably large circles.
My Wall Street self would have tried to model these fictions. My data engineer self knows to ignore them. I’m not investing in the story. I’m investing in the engine’s capacity to generate a story.
When I built my EDGAR pipeline, the story was “use ML to beat the market.” But the reality was a complex system of Python scripts, message queues, and databases. If that system failed, the story was irrelevant. My ML model was completely dependent on the quality, timeliness, and structure of the data I fed it. Garbage in, garbage out.
Early-stage startups are no different. They all pitch a beautiful story, often powered by “AI” or “big data.” But I’m not listening to the pitch. I’m looking at the pipes.
The Engineer’s Diligence Toolkit
When I meet a founding team, I bring a completely different set of questions. My diligence process is not about their Excel model. It’s about their architecture diagram.
1. Auditing the “Technical” Co-Founder
Investors love to talk about “backing the team.” But what does that mean? For most, it’s pattern matching. “Oh, she was an engineer at Google.” “He was a PM at Facebook.”
This is a proxy for competence, but it’s a weak one.
I don’t care where they worked. I care how they think.
A data engineer has a “systems-level” mindset. We are forced to think about failure, scale, and dependencies. When I talk to a technical founder, I’m not asking about the features they’re building. I’m asking about the foundation.
A traditional investor asks: “What’s on your product roadmap for the next six months?”
I ask: “Walk me through your data model. What are your core entities and how do they relate? Why did you choose that schema?”
The answer to my question tells me everything. A clear, well-considered data model shows they understand their own business at a fundamental level. A messy, convoluted model tells me they are hacking, not engineering. It reveals they are incurring massive technical debt before they’ve even hit their first growth spurt.
I ask: “Show me your logging and monitoring setup. What happens when a critical process fails at 3 AM?”
A founder who fumbles this is a hacker. A founder who says, “We log structured events to Datadog and have PagerDuty alerts on our core data ingestion pipeline” is an engineer.
One is building a demo. The other is building a business.
I learned this from my EDGAR project. The first version had no logging. When it broke, I had no idea why. I spent days just trying to debug it. The second version logged everything. I could fix problems in minutes.
I am investing in the founders who have already learned that lesson.
2. Deconstructing the “Product”
The early-stage product is an iceberg. The investor sees the 10 percent above the water: the UI, the website, the mobile app. It’s the pretty frontend that a founder presents in a demo.
That part matters a lot to users, and other investors, and it matters a lot to me!
…but there’s so much more.
The engineer lives in the 90 percent below the water. This is the infrastructure. The databases. The APIs. The event streams. The data pipelines.
This 90 percent is the real product. It’s the real moat.
I recently looked at a startup in the “social media analytics” space. Their pitch was all about their beautiful “insights dashboard” for marketers. The traditional VC was wowed by the charts.
My questions were different.
“You’re pulling data from five different social media APIs. How are you handling ingestion?”
“What’s your process for rate limiting?”
“What message queue are you using to buffer the incoming data?”
“What’s your data’s ‘shape’ once it lands? Is it JSON? Are you flattening it?
“What’s the latency between an event happening on X and it appearing in your customer’s dashboard?”
The founder’s eyes lit up. No other investor had asked this.
He totally geeked out.
He excitedly whiteboarded his entire architecture, which was built around Kafka, and he talked about the distinction between a raw data lake and a structured warehouse for analytics.
I knew in two minutes that he was a builder. He understood the real problem was not building charts. The real problem was creating a reliable, scalable, and low-latency data factory. The charts were the easy part.
A non-technical investor would have been completely blind to this. They would have focused on the colors of the dashboard. I was able to vet the foundation of the entire business.
3. Auditing “Traction”
Founders come in armed with vanity metrics. “We have 10,000 users!” “We have 50,000 monthly active users!”
My experience building a hedge fund taught me to be ruthless about definitions. When I analyzed EDGAR filings, I had to define, “What is a ‘material’ event?” The definition mattered. A bad definition led to a bad signal and lost money.
When a founder gives me a traction metric, I ask one simple question.
“Define ‘user’.”
The answer is, again, illuminating.
A weak founder says: “Someone who signed up.”
A strong founder says: “A ‘user’ is an account that has signed up. An ‘active user’ is one who has performed our ‘core activation event’—let’s say, creating three invoices—within their first seven days. We track this via our Segment event pipeline.”
This is the difference between night and day.
The first founder is tracking a vanity metric. The second founder has a data-driven hypothesis about their own business. They understand activation and retention.
My next question is: “Can I see your event-tracking schema?”
This is the data engineer’s secret weapon. A company’s event-tracking schema is a window into its soul. It shows what they care about. Are they just tracking page_view and user_signup? Or are they tracking granular, meaningful business events like project_created, team_member_invited, and payment_plan_upgraded?
A company that doesn’t track its core business events cannot learn. It cannot A/B test. It cannot understand its users. It cannot build the personalization or ML models it promises in its pitch deck.
A data engineer sees this in seconds. A finance investor wouldn’t even know to ask.
4. Assessing Scalability
This is the real moat.
Every seed-stage company plans to 100x or 1000x its user base. A finance-brained investor thinks about this in terms of “operating leverage” and “economies of scale.”
A data engineer thinks about what breaks.
When I built my EDGAR scraper, it worked beautifully for 100 filings going back 36-months. When I pointed it at the entire 20-year history, it melted. The database fell over. The parsing script ran out of memory.
I had to re-architect everything for distributed processing. I learned about scalability through pain.
So I ask founders one of my favorite questions: “Let’s say you get on the front page of TechCrunch tomorrow and 100,000 new users sign up. What part of your system breaks first?”
This is a stress test for foresight.
The naive founder says: “Oh, we’re on AWS. It will just scale.”
The experienced engineer says: “Our monolithic Postgres database. The write-load on the primary user table would be the bottleneck. Our first step would be to implement read replicas to offload reporting queries, and our long-term plan is to shard that table or move to a horizontally scalable database.”
I will invest in the second founder every single time. They have already anticipated failure. They understand that “scale” isn’t a magic word. It’s an engineering problem.
A bad architectural choice at the seed stage can kill a company at the Series B stage. The “technical debt” becomes so massive that the company grinds to a halt, unable to ship new features because they are constantly fighting fires.
As a data engineer, I am underwriting the option to scale. A traditional investor is just hoping for scale.
The Future is a Data Factory
My time on Wall Street showed me that businesses were valued based on their assets and cash flows. My time as a data engineer taught me that for a modern tech company, the data infrastructure is the primary asset.
Raw data is crude oil. It’s valuable, but it’s messy and unusable in its raw state. The data infrastructure (the pipelines, the warehouses, the event streams) is the refinery. It’s the factory that turns crude data into valuable products: insights, personalized user experiences, and machine learning models.
This is the most significant shift in my thinking.
When a non-technical investor sees a software product that generates Monthly Recurring Revenue they try to measure how expensively this MRR can be grown. I see that too. But underneath, I see a data collection mechanism.
A SaaS tool for, say, dentists, isn’t just a scheduling tool. It is a machine for building the richest, most structured dataset on the planet about how dental practices operate. The real long-term value isn’t the $50/month subscription. It’s that proprietary dataset.
But that dataset is only valuable if it’s engineered correctly. Is it clean? Is it structured? Is it accessible? Is it structured in a way that “plays nice” with other data you have (or can get access to)?
This brings us to the “AI” buzzword.
As a data engineer, I am uniquely positioned to call bullshit on this. When a founder says “we use AI,” I say:
“Great. Show me your data pipeline. What are your feature stores? How do you version your datasets and models? What’s your data labeling process? How do you monitor for data drift?”
Most of the time, I get a blank stare. They don’t have an AI company. They have a marketing slogan and some API calls to OpenAI.
The few who can answer those questions are the ones building a real, defensible business. They are building the data factory. And the person who controls the factory, not the person who just dreams up a product, is the one who wins.
The Plumber Always Wins
My adventure from finance to engineering was a journey from the abstract to the concrete.
On Wall Street we trafficked in projections and stories. As a data engineer, I dealt with the concrete, the plumbing, the systems that either work or fail. There is no middle ground. A pipeline is not “sort of” running.
This binary, systems-level thinking is the greatest advantage in early-stage investing.
You can’t value a seed-stage company on a spreadsheet. You can’t trust the narrative in the pitch deck. The only thing you can truly audit is the foundation. Is it built on sand, or is it built on bedrock?
Are the founders just presenting a product, or are they engineering a system?
Are they building a pretty frontend, or are they building a data factory?
The traditional investor is buying a map to a gold mine. The data engineer investor is inspecting the quality of the pickaxes, the structural integrity of the mine shaft, and the efficiency of the refinery.
In the gold rush of technology, it’s the plumber who understands the infrastructure who ultimately wins. I don’t invest in stories anymore. I invest in the plumbing.
Friends: in addition to the 17% discount for becoming annual paid members, we are excited to announce an additional 10% discount when paying with Bitcoin. Reach out to me, these discounts stack on top of each other!
Thank you for helping us accelerate Life in the Singularity by sharing.
I started Life in the Singularity in May 2023 to track all the accelerating changes in AI/ML, robotics, quantum computing and the rest of the technologies accelerating humanity forward into the future. I’m an investor in over a dozen technology companies and I needed a canvas to unfold and examine all the acceleration and breakthroughs across science and technology.
Our brilliant audience includes engineers and executives, incredible technologists, tons of investors, Fortune-500 board members and thousands of people who want to use technology to maximize the utility in their lives.
To help us continue our growth, would you please engage with this post and share us far and wide?! 🙏


