December 9, 2021

The Massive Revenue Opportunity for Startups Digitizing Data

The next multibillion dollar opportunity. Digitize data trapped on paper and build a business around the data. We explore the three billion dollar opportunities for OCR startups digitizing data: sell analytics to existing customers, sell analytics to new customers, and expand into automating near-adjacent steps in the business workflows. Or build a competing startup that eliminates paper by digitizing forms and automating processes. We balance the revenue upside with the building challenges for each of these businesses before considering how the future will unfold.

Hey friends -

If you missed last week's letter on the mechanics of OCR, you can find that here. Or you can always listen to the podcast version here! Both set the groundwork for this week's letter - the incredible opportunities for OCR startups to build on their core businesses.

In this week's letter:

  • The major revenue opportunities for OCR startups, moving towards a digital-first model, and the future of paper and digital world
  • The beginning of the end for overdraft fees, studying meerkats to understand network effects, and other cocktail talk
  • My personal Mulled Wine recipe, honed over years of holiday parties

Total read time: 14 minutes, 29 seconds.

Key Takeaways

  • OCR startups sit in a hugely advantageous position - access and right to valuable financial data
  • They can build new business lines monetize the data three ways - sell analytics and insights to existing customers, sell analytics and insights to new customers, and digitize and automate more of the business processes
  • Financial services firms have an alternative - go fully digital with digital forms, data sharing, and reinvented processes
  • Startups are well-positioned to offer fully digital services, but the more complex the offering - especially reinvented processes - the more difficult the business to build
  • The reality is that paper will be with us for a long time and financial services firms will purchase both - digitization is a rising tide that will lift startups selling both OCR and digital-first offerings

Unprecedented Data Access

OCR - the technology to digitize data trapped on paper - sits at the very beginning of many financial services processes. Think about applying for a mortgage, for example. The mortgage applicant has to submit hundreds of pages of financial documents including taxes and bank statements that help create a financial portrait of the applicant.

But the mortgage process can't begin in earnest until the data on those documents can be analyzed. That means the process starts with OCR.

That puts OCR startups in a hugely advantageous position:

  • They have access to almost all of the data that will be used to make a decision such as whether to issue a loan
  • They're first in line to analyze that data

This dual advantage unlocks opportunities for these startups, many of which they're only just beginning to realize.

Recap of last week's letter

Last week's letter was an in-depth look at why paper is difficult for machines to understand, how OCR digitizes paper, and why it remains a hugely manual effort. The key points for this week's letter are:

  • Financial services often involve lots of paperwork that requires high degrees of accuracy when digitized.
  • OCR is optical character recognition, the technology that converts ink blotches on paper into letters and words.
  • OCR can get to 99%+ degrees of accuracy, but that often isn't good enough. People provide manual reviews to get to higher levels of accuracy.
  • To get high degrees of accuracy, OCR technologies have to be manually configured (or "trained" in the case of AI) for narrow domains, such as one specific banks' mortgage documents.
  • The combination of narrow domain and significant people involvement means that OCR startups often have unusually deep insights into specific financial services processes.

From data to insights

The core business for startups that sell OCR technology is digitizing documents. Historically, their only role was to deliver the digitized words, numbers, and other content to their financial services customers. But some startups have realized that they can go beyond just digitization to build entirely new business lines.

There are three areas for growth:

  1. Analyze the digitized content and sell the insights to their existing financial services customers (other businesses)
  2. Analyze the digitized content and sell the insights to new financial services customers
  3. Expand beyond the digitized content into building technology to digitize and automate even more of the business processes

Let's explore all three.

Analyze data & sell to existing customers

Providing analytics on top of the digitized content to existing financial services customers is the easiest business for an established OCR startup to launch. They already have the business relationships and any internal approvals needed - such as those from procurement for vendor risk management - necessary to deliver their services.

Unless specified otherwise in a contract, OCR startups can analyze data on a customer-by-customer basis. For a startup analyzing bank statements, a valuable new product could be checking for fraud by verifying that the debits and credits do in fact add up to the total on the statement. We can imagine similar, more sophisticated analytics to check that income on tax forms matches the amounts credited to bank accounts.

Once we consider analyzing aggregated, anonymized data from across financial services customers, we hit a potential bump in the road. OCR startups need explicit language in their customer contracts that they, the startup, own the rights to those insights.

With the appropriate data rights secured, we can take the analytics a step further. A startup performing checking for fraud can create a "bad actors list." If the company detects that an individual has modified their bank statement while applying for a loan at Bank A, they can put the individual on the bad actor list and warn all of their other clients. That's information that's rarely shared across financial services institutions today.

When that information is shared across companies in the US, it is usually done so through Section 314(b) of the Patriot Act which permits information sharing to identify and report anti-money laundering. Unsurprisingly given it is government organized and all filing must be reported to the Treasury, the paperwork required to share information under Section 314(b) is onerous. Already digitized data piped straight into the reporting could meaningfully reduce the effort.

We could also take the same digitized data and sell it to new financial services companies.

Analyze data & sell to new customers

Selling a new product to new customers is more difficult than up-selling existing customers. Not only does the OCR startup need to develop new business relationships, the new customers also won't purchase the core OCR technology. These customers are only here for the analyzed data.

New customers purchasing a new product will create significant indigestion within the OCR startup's sales organization unless managed properly. Just for starters, the contract sizes will be different, the sales process will be different, and the sales cycle will be different. Heterogenous sales engines are difficult to manage for the best of startups. This is not something that should be pursued willy-nilly. Once it's a formal effort for the startup, it will need senior executive attention and space to grow if it's to succeed.

Let's start with the very same bank statement data again. We could use it to predict the cash needs of an individual and perhaps offer them new services. I previously highlighted startups like Brigit that help individuals avoid overdraft fees. A startup that is analyzing bank statements already has deep insights into who is suffering from overdraft fees and can refer them to Brigit and others.

We could also anonymize the data and perform peer group comparisons, like comparing monthly expenses at restaurants versus grocery stores for individuals with urban versus suburban zip codes. That would be a valuable insight to individuals considering relocating. Startups like SmartAsset that are "helping people make smart financial decisions" would be a natural buyer of those insights. So too would Updater, a startup rethinking moving services by packing internet, home security, and all of the other disparate to-dos into a single service. These aren't small clients - SmartAsset sees over 65 million daily active visitors and Updater facilitates one in every four moves in the US.

These new customers can also become partners. Insofar as the OCR startup and the new customer sell to the same financial services companies, a joint "powered by..." product offering that combines the customer's product with the OCR startup's insights may present yet another opportunity.

OCR startups have yet another opportunity. While it's arguably the biggest, it's also certainly the most complex. Rather than just deal with the digitized data, they can expand into automating the business processes where the data is used.

From digitizing paper to automating workflows

Digitizing paper in financial services is rarely done in isolation - it's usually done as part of a larger process such as approving financing for trade or issuing a mortgage. Just like getting digital data from paper, most of these processes remain highly manual.

It's an opportunity for OCR startups to expand into building workflow tools.

Mortgages entail a particularly complex workflow within banks. While it may seem like you're dealing with "the bank" when applying for a mortgage, the reality is that there are multiple different departments within the bank that each needs slightly different versions of your data to complete their part of the process. While underwriting is concerned with your ability to pay back the mortgage, the compliance department is concerned with making sure you're not a terrorist or engaged in fraud. They're just two of the many departments that will get involved.

Each of these departments needs a different slice of the data in their own specific formats to move ahead with their processes. Without technology, this usually means a lot of spreadsheets emailed around and a painful amount of copy-and-paste. This amounts to a lot of manual data transformation.

The OCR startup has two additional opportunities to help, beyond just digitizing the content:

  • They can also automate the data transformation process so it is delivered to each of the departments in their preferred formats, and
  • They can sell a toolkit so the bank itself can automate the data transformation into different formats.

Built correctly, an OCR startup should be able to sell both products. The toolkit used to automate the data transformation should be the very same toolkit that the OCR startup uses itself! Said differently - what could be treated by the OCR startup as a significant expense, namely developing a tool to transform data so different departments can each get their own format, can become an entirely new revenue stream! It's the strategy currently being pursued by Hyperscience to great success.

The key to building the specific data formats and the toolkit is a deep understanding of how and by whom the data is used. It's difficult to develop those insights unless you're deeply embedded in the processes. Few except for OCR startups are as intimately involved in the processes. It's a huge advantage.

Mortgages are just one of many processes that can be automated in financial services organizations. It's a massive opportunity. But there is another approach entirely, one that doesn't involve any OCR at all. Slowly but surely, it's starting to gain real traction.

Ditch Paper & Go Full Digital

Why even bother with paper in the first place? Just digitize everything.

Digital forms

Startups like Anvil enable financial services companies to create their own digital forms and build the basic form-filling process around them. This can be helpful both for common, standardized tax forms like the I-9 and more complex forms like one that describes your financial situation when applying for a mortgage.

Forms are often more complex than they appear. Even completing a "basic" mortgage form may involve dividing up form sections among two spouses, each of whom will complete their relative section. It may also involve the spouses' financial advisor who can may server as a reviewer or actually help complete the form. These nuances evolve into complex workflows quite rapidly, which becomes evident as we flesh out the scenario:

  • Timmy and Sally are jointly applying for a mortgage where each will enter their respective information and they ask financial advisor Becky to complete the "please detail your investments" section.
  • Before form filling begins, Timmy assigns the relevant questions to the relevant people. He also sets up Sally and himself as approvers (two approvals required before signature) and as signatories.
  • Timmy and Sally can answer their questions in parallel. Becky receives an email once they've completed their sections asking her to complete the investment sections.
  • Once Becky is finished, Timmy and Sally receive emails asking them to review all questions and approve the form for signature. They approve.
  • Timmy and Sally receive another email asking them to sign. They sign.
  • Everyone receives an email with a fully signed copy.

That's wildly complex! Simplifying that complexity so every day, non-tech savvy users can actually use the digital forms is one of the major challenges for startups like Anvil. If successful, the upside for financial services customers and end-users can be significant. Data a user previously entered can be pre-populated into future forms to save time and effort. Just like with OCR, the output data can then be passed directly to where it's needed by the financial services firm.

The biggest adoption hurdle for traditional financial services firms is that these forms don't exist in isolation - they exist as part of larger processes. Introducing digital forms means rejiggering processes that have already been reviewed and approved by compliance and often involve many different departments. The change is larger than simply swapping out paper for digital.

Digital data sharing

Removing forms is one step in going fully digital. Enabling customers to share information from other financial institutions - like bank statements - is a further step. Digital data sharing usually involves specialist intermediaries who should have no rights to your data - they're just the pipes - but that's rarely the case. Most of us rarely read the fine print in terms and conditions and few regulations prohibit them from accessing the data. So they do it.

One of these intermediaries is Plaid. If you've ever gotten a pop-up when filing taxes or opening a brokerage account that says, "Would you like to allow XYZ to access your ABC bank account?", then you've used it to grant access to your bank account. By their own numbers, 1 in 4 Americans has used the service.

That's quite the target for hackers. While Plaid goes to great lengths to state that "Plaid only shares your data with your consent," the cookies on their website clearly indicate they sell website tracking data to Bing Ads, Facebook Pixel, Google Ads (Gtag), Google Tag Manager, LinkedIn Insight Tag, Podsights, Twitter Ads, 6Sense, and The Trade Desk. The privacy policy itself allows unfettered use of data to enhance and develop new services, including by sharing the data with third parties who will do the work.

While Plaid has not had a publicly reported security breach and their policies do appear to take consumer data rights seriously, they take risks with your and my data that are not obvious unless you read the minutia of their policies. A financial services firm that uses Plaid will likely be fined Plaid has a security breach. That's a cost that may materially outweigh the expected benefits of allowing customers to avoid uploading paper copies.

There is another option - digitize and automate the business process, not just the forms and data sharing.

Digital, automated business processes

The depth of knowledge required to automate a business process is significant. The technical and design challenges can be equally daunting. Not only does a startup need to build technology that can automate a process, they also need to reconfigure the technology for every financial services customer, each of whom will have a slightly different version of the process.

The dual requirements of deep domain expertise and the complexity of a system that can be reconfigured per client have led startups to focus on narrow problems, often starting with just a piece of one business process. Blend launched in 2012 just focusing on automating mortgages. Even today, as a $2.1 billion public company that is a leader in the space, Blend has just 330 customers and only supports processes for mortgages and consumer banking.

Their challenges are a stark reminder of just how difficult it is to build, sell, and deploy a digital platform that can bring old processes into the twenty-first century even when the customer experience is orders of magnitude better. When a financial services company chooses to purchase Blend's products, they're concurrently committing to overhauling their existing processes. That's inevitably a multi-year effort that will cost millions of dollars.

The flip side is that this tremendous investment by a financial services firm ensures that, once sold, Blend's product is highly unlikely to be replaced. Not only would the financial institution need to find a new vendor and spend many millions more deploying new software, but they'd also have to retrain their workforce (again!) on a new system.

For Blend and others who manage to build compelling, narrow domain business process digitization and automation platforms, that's a wonderful place to be. This stickiness shows up in the numbers. In 2020, Blend had a 167% net dollar retention. That means for every $100 their clients spent with them last year, they spent $167 this year! That's without even considering new customers!

The Digital Future Still Includes A Lot of Paper

Many financial services processes will continue to be digitized, but that transformation will happen over decades not years. Even as parts of the processes are transformed to a digital-first approach, paper will remain prevalent. That leaves a significant opportunity for OCR startups to thrive even as less paper is used.

Imagine a bank that purchases Blend's mortgage product, uses Plaid so customers can share their bank data from other banks, and provides fully digital forms for any one-off forms that need to be completed. Even for such a digital bank, there will be paper.

Identity documents including passports and licenses immediately come to mind. US Passports are all paper-based. While Apple recently announced that 8 states will support digital licenses soon using the Apple Wallet on their iPhone and Apple Watch, licenses remain paper-based for the other 42 states.

If business records are required as well, expect a lot more paper. OCR startups like Ocrolus will continue to have no shortage of opportunities. Take the recent PPP loans. They processed submitted applications for upwards of 65% of all approved loans - over $500 billion!

Digitizing documents, forms, and processes is a case of a rising tide and expanding pie. There's space for all types of startups. Beyond accuracy and related executional competencies, what will differentiate the winners will be their ability to go from their initial niche to the broader opportunities. Financial services companies are inherently slow to change. It is difficult to sell them a product that alters decades-old processes. But, once sold, the data is rich and the product sticky. It's up to the startups to take advantage of those positions.

Cocktail Talk

  • Capital One, a bank, announced that they will no longer charge overdraft fees, putting an end to a $150 million annual revenue stream for the company. In a previous letter, I dug into how overdraft fees work and the changing landscape. I stated that "[t]hese startups and banks are winning customers by offering better services at cheaper prices... My hope is that this continues in earnest." It's great to see a major bank like Capital One move in the right direction. (CNBC)
  • Good news from the world of nuclear energy! Nuclear fusion startup Commonwealth Fusion Systems raised more than $1.8 billion from investors. Fusion is the process of creating energy by combining atoms. To date, fusion hasn’t worked - all attempted efforts consume more energy than they output. But that failure is in large part due to a lack of invested capital necessary to prove out the technology. Commonwealth is a 2018 MIT spinout, now pursuing production as a commercial rather than purely academic concern. It will take a decade or more to begin to meaningfully realize the potential of fusion, but this is certainly a meaningful start. (WSJ)
  • Metcalfe’s Law 👉 Meerkat’s Law. Talk about combining many of my favorite things. Andrew Chen of a16z, a venture capital firm, explores the world of social animals including meerkats to better understand network effects. While animals may seem simple if not closely observed, they in fact exhibit wonderfully complex, adaptive behaviors. They present a rich space for exploration to better understand ourselves and how we interact. (a16z)
  • Square, a payments company, changed its name to Block to reflect its increasing focus on blockchain and cryptocurrency. They launched with some, uh, interesting profile pictures for the leadership team. Thanks to some enterprising developers, you too can now be a blockhead! (Blockify)

Your Weekly Cocktail

For quiet nights and holiday parties - mulled wine is a seasonal classic.

Jared's Mulled Wine

2 bottles Cheap Red Wine
3/4 cups Honey
1-inch piece Fresh Ginger, sliced
2 Cinnamon Sticks
3 Star Anise
1/2 tbsp Black Peppercorns
6 Allspice Berries
2 Cardamon Pods, slightly crushed
Peel from ~1/2 an Orange
Peel from ~1/2 a Lemon
Cheesecloth
Twine

Place all of the ingredients, except the wine and honey, into a cheesecloth bag (you can make a bag by tying the four corners). Pour the wine into a large pot and bring to simmer. Stir in the honey so it dissolves. Tie one end of the twine to the cheesecloth bag and the other to the handle of the pot so that the bag is suspended into the wine (you don't want it sitting on the bottom of the pot and burning). Simmer for at least 1.5 hours. Serve piping hot.

Some helpful notes from my follies:

  • Do not use pre-ground spices - no one likes drinking grit.
  • More expensive wine won't make a meaningful difference.
  • Avoid dry and high-tannin wines. If the end result isn't sweet enough for your liking, try adding more honey, brandy, or Grand Marnier. Do it slowly and taste frequently! You can't take it back out.
  • This recipe scales really well for large groups - I recently did a 6-bottle version. Keep the wine-to-honey ratio the same. Double the other ingredients for every 3x increase in wine.
  • Have fun! This recipe is not an exact science. Add some ingredients, remove others, and change the ratios. Simmer it for longer.
Jared’s Mulled Wine

I only make this once or twice a year and always for large groups of friends. The smells as steam rises from a hot pot of mulled wine are wonderfully enticing - it reminds me of exploring French marchés in the days leading up to Christmas. With mugs stacked to the side, you can leave a pot on the stovetop for an entire party as guests serve themselves. In the very unlikely event that you have mulled wine leftover, it keeps reasonably well in a jar or back in the wine bottle for a couple of weeks. Just reheat to enjoy again and again!

Cheers,
Jared

Subscribe
Join 1000+ subscribers to get the inner workings of finance delivered straight to your inbox.