The next multibillion dollar opportunity. Digitize data trapped on paper and build a business around the data. We explore the three billion dollar opportunities for OCR startups digitizing data: sell analytics to existing customers, sell analytics to new customers, and expand into automating near-adjacent steps in the business workflows. Or build a competing startup that eliminates paper by digitizing forms and automating processes. We balance the revenue upside with the building challenges for each of these businesses before considering how the future will unfold.
Hey friends -
If you missed last week's letter on the mechanics of OCR, you can find that here. Or you can always listen to the podcast version here! Both set the groundwork for this week's letter - the incredible opportunities for OCR startups to build on their core businesses.
In this week's letter:
Total read time: 14 minutes, 29 seconds.
OCR - the technology to digitize data trapped on paper - sits at the very beginning of many financial services processes. Think about applying for a mortgage, for example. The mortgage applicant has to submit hundreds of pages of financial documents including taxes and bank statements that help create a financial portrait of the applicant.
But the mortgage process can't begin in earnest until the data on those documents can be analyzed. That means the process starts with OCR.
That puts OCR startups in a hugely advantageous position:
This dual advantage unlocks opportunities for these startups, many of which they're only just beginning to realize.
Last week's letter was an in-depth look at why paper is difficult for machines to understand, how OCR digitizes paper, and why it remains a hugely manual effort. The key points for this week's letter are:
The core business for startups that sell OCR technology is digitizing documents. Historically, their only role was to deliver the digitized words, numbers, and other content to their financial services customers. But some startups have realized that they can go beyond just digitization to build entirely new business lines.
There are three areas for growth:
Let's explore all three.
Providing analytics on top of the digitized content to existing financial services customers is the easiest business for an established OCR startup to launch. They already have the business relationships and any internal approvals needed - such as those from procurement for vendor risk management - necessary to deliver their services.
Unless specified otherwise in a contract, OCR startups can analyze data on a customer-by-customer basis. For a startup analyzing bank statements, a valuable new product could be checking for fraud by verifying that the debits and credits do in fact add up to the total on the statement. We can imagine similar, more sophisticated analytics to check that income on tax forms matches the amounts credited to bank accounts.
Once we consider analyzing aggregated, anonymized data from across financial services customers, we hit a potential bump in the road. OCR startups need explicit language in their customer contracts that they, the startup, own the rights to those insights.
With the appropriate data rights secured, we can take the analytics a step further. A startup performing checking for fraud can create a "bad actors list." If the company detects that an individual has modified their bank statement while applying for a loan at Bank A, they can put the individual on the bad actor list and warn all of their other clients. That's information that's rarely shared across financial services institutions today.
When that information is shared across companies in the US, it is usually done so through Section 314(b) of the Patriot Act which permits information sharing to identify and report anti-money laundering. Unsurprisingly given it is government organized and all filing must be reported to the Treasury, the paperwork required to share information under Section 314(b) is onerous. Already digitized data piped straight into the reporting could meaningfully reduce the effort.
We could also take the same digitized data and sell it to new financial services companies.
Selling a new product to new customers is more difficult than up-selling existing customers. Not only does the OCR startup need to develop new business relationships, the new customers also won't purchase the core OCR technology. These customers are only here for the analyzed data.
New customers purchasing a new product will create significant indigestion within the OCR startup's sales organization unless managed properly. Just for starters, the contract sizes will be different, the sales process will be different, and the sales cycle will be different. Heterogenous sales engines are difficult to manage for the best of startups. This is not something that should be pursued willy-nilly. Once it's a formal effort for the startup, it will need senior executive attention and space to grow if it's to succeed.
Let's start with the very same bank statement data again. We could use it to predict the cash needs of an individual and perhaps offer them new services. I previously highlighted startups like Brigit that help individuals avoid overdraft fees. A startup that is analyzing bank statements already has deep insights into who is suffering from overdraft fees and can refer them to Brigit and others.
We could also anonymize the data and perform peer group comparisons, like comparing monthly expenses at restaurants versus grocery stores for individuals with urban versus suburban zip codes. That would be a valuable insight to individuals considering relocating. Startups like SmartAsset that are "helping people make smart financial decisions" would be a natural buyer of those insights. So too would Updater, a startup rethinking moving services by packing internet, home security, and all of the other disparate to-dos into a single service. These aren't small clients - SmartAsset sees over 65 million daily active visitors and Updater facilitates one in every four moves in the US.
These new customers can also become partners. Insofar as the OCR startup and the new customer sell to the same financial services companies, a joint "powered by..." product offering that combines the customer's product with the OCR startup's insights may present yet another opportunity.
OCR startups have yet another opportunity. While it's arguably the biggest, it's also certainly the most complex. Rather than just deal with the digitized data, they can expand into automating the business processes where the data is used.
Digitizing paper in financial services is rarely done in isolation - it's usually done as part of a larger process such as approving financing for trade or issuing a mortgage. Just like getting digital data from paper, most of these processes remain highly manual.
It's an opportunity for OCR startups to expand into building workflow tools.
Mortgages entail a particularly complex workflow within banks. While it may seem like you're dealing with "the bank" when applying for a mortgage, the reality is that there are multiple different departments within the bank that each needs slightly different versions of your data to complete their part of the process. While underwriting is concerned with your ability to pay back the mortgage, the compliance department is concerned with making sure you're not a terrorist or engaged in fraud. They're just two of the many departments that will get involved.
Each of these departments needs a different slice of the data in their own specific formats to move ahead with their processes. Without technology, this usually means a lot of spreadsheets emailed around and a painful amount of copy-and-paste. This amounts to a lot of manual data transformation.
The OCR startup has two additional opportunities to help, beyond just digitizing the content:
Built correctly, an OCR startup should be able to sell both products. The toolkit used to automate the data transformation should be the very same toolkit that the OCR startup uses itself! Said differently - what could be treated by the OCR startup as a significant expense, namely developing a tool to transform data so different departments can each get their own format, can become an entirely new revenue stream! It's the strategy currently being pursued by Hyperscience to great success.
The key to building the specific data formats and the toolkit is a deep understanding of how and by whom the data is used. It's difficult to develop those insights unless you're deeply embedded in the processes. Few except for OCR startups are as intimately involved in the processes. It's a huge advantage.
Mortgages are just one of many processes that can be automated in financial services organizations. It's a massive opportunity. But there is another approach entirely, one that doesn't involve any OCR at all. Slowly but surely, it's starting to gain real traction.
Why even bother with paper in the first place? Just digitize everything.
Startups like Anvil enable financial services companies to create their own digital forms and build the basic form-filling process around them. This can be helpful both for common, standardized tax forms like the I-9 and more complex forms like one that describes your financial situation when applying for a mortgage.
Forms are often more complex than they appear. Even completing a "basic" mortgage form may involve dividing up form sections among two spouses, each of whom will complete their relative section. It may also involve the spouses' financial advisor who can may server as a reviewer or actually help complete the form. These nuances evolve into complex workflows quite rapidly, which becomes evident as we flesh out the scenario:
That's wildly complex! Simplifying that complexity so every day, non-tech savvy users can actually use the digital forms is one of the major challenges for startups like Anvil. If successful, the upside for financial services customers and end-users can be significant. Data a user previously entered can be pre-populated into future forms to save time and effort. Just like with OCR, the output data can then be passed directly to where it's needed by the financial services firm.
The biggest adoption hurdle for traditional financial services firms is that these forms don't exist in isolation - they exist as part of larger processes. Introducing digital forms means rejiggering processes that have already been reviewed and approved by compliance and often involve many different departments. The change is larger than simply swapping out paper for digital.
Removing forms is one step in going fully digital. Enabling customers to share information from other financial institutions - like bank statements - is a further step. Digital data sharing usually involves specialist intermediaries who should have no rights to your data - they're just the pipes - but that's rarely the case. Most of us rarely read the fine print in terms and conditions and few regulations prohibit them from accessing the data. So they do it.
One of these intermediaries is Plaid. If you've ever gotten a pop-up when filing taxes or opening a brokerage account that says, "Would you like to allow XYZ to access your ABC bank account?", then you've used it to grant access to your bank account. By their own numbers, 1 in 4 Americans has used the service.
That's quite the target for hackers. While Plaid goes to great lengths to state that "Plaid only shares your data with your consent," the cookies on their website clearly indicate they sell website tracking data to Bing Ads, Facebook Pixel, Google Ads (Gtag), Google Tag Manager, LinkedIn Insight Tag, Podsights, Twitter Ads, 6Sense, and The Trade Desk. The privacy policy itself allows unfettered use of data to enhance and develop new services, including by sharing the data with third parties who will do the work.
While Plaid has not had a publicly reported security breach and their policies do appear to take consumer data rights seriously, they take risks with your and my data that are not obvious unless you read the minutia of their policies. A financial services firm that uses Plaid will likely be fined Plaid has a security breach. That's a cost that may materially outweigh the expected benefits of allowing customers to avoid uploading paper copies.
There is another option - digitize and automate the business process, not just the forms and data sharing.
The depth of knowledge required to automate a business process is significant. The technical and design challenges can be equally daunting. Not only does a startup need to build technology that can automate a process, they also need to reconfigure the technology for every financial services customer, each of whom will have a slightly different version of the process.
The dual requirements of deep domain expertise and the complexity of a system that can be reconfigured per client have led startups to focus on narrow problems, often starting with just a piece of one business process. Blend launched in 2012 just focusing on automating mortgages. Even today, as a $2.1 billion public company that is a leader in the space, Blend has just 330 customers and only supports processes for mortgages and consumer banking.
Their challenges are a stark reminder of just how difficult it is to build, sell, and deploy a digital platform that can bring old processes into the twenty-first century even when the customer experience is orders of magnitude better. When a financial services company chooses to purchase Blend's products, they're concurrently committing to overhauling their existing processes. That's inevitably a multi-year effort that will cost millions of dollars.
The flip side is that this tremendous investment by a financial services firm ensures that, once sold, Blend's product is highly unlikely to be replaced. Not only would the financial institution need to find a new vendor and spend many millions more deploying new software, but they'd also have to retrain their workforce (again!) on a new system.
For Blend and others who manage to build compelling, narrow domain business process digitization and automation platforms, that's a wonderful place to be. This stickiness shows up in the numbers. In 2020, Blend had a 167% net dollar retention. That means for every $100 their clients spent with them last year, they spent $167 this year! That's without even considering new customers!
Many financial services processes will continue to be digitized, but that transformation will happen over decades not years. Even as parts of the processes are transformed to a digital-first approach, paper will remain prevalent. That leaves a significant opportunity for OCR startups to thrive even as less paper is used.
Imagine a bank that purchases Blend's mortgage product, uses Plaid so customers can share their bank data from other banks, and provides fully digital forms for any one-off forms that need to be completed. Even for such a digital bank, there will be paper.
Identity documents including passports and licenses immediately come to mind. US Passports are all paper-based. While Apple recently announced that 8 states will support digital licenses soon using the Apple Wallet on their iPhone and Apple Watch, licenses remain paper-based for the other 42 states.
If business records are required as well, expect a lot more paper. OCR startups like Ocrolus will continue to have no shortage of opportunities. Take the recent PPP loans. They processed submitted applications for upwards of 65% of all approved loans - over $500 billion!
Digitizing documents, forms, and processes is a case of a rising tide and expanding pie. There's space for all types of startups. Beyond accuracy and related executional competencies, what will differentiate the winners will be their ability to go from their initial niche to the broader opportunities. Financial services companies are inherently slow to change. It is difficult to sell them a product that alters decades-old processes. But, once sold, the data is rich and the product sticky. It's up to the startups to take advantage of those positions.
For quiet nights and holiday parties - mulled wine is a seasonal classic.
2 bottles Cheap Red Wine
3/4 cups Honey
1-inch piece Fresh Ginger, sliced
2 Cinnamon Sticks
3 Star Anise
1/2 tbsp Black Peppercorns
6 Allspice Berries
2 Cardamon Pods, slightly crushed
Peel from ~1/2 an Orange
Peel from ~1/2 a Lemon
Cheesecloth
Twine
Place all of the ingredients, except the wine and honey, into a cheesecloth bag (you can make a bag by tying the four corners). Pour the wine into a large pot and bring to simmer. Stir in the honey so it dissolves. Tie one end of the twine to the cheesecloth bag and the other to the handle of the pot so that the bag is suspended into the wine (you don't want it sitting on the bottom of the pot and burning). Simmer for at least 1.5 hours. Serve piping hot.
Some helpful notes from my follies:
I only make this once or twice a year and always for large groups of friends. The smells as steam rises from a hot pot of mulled wine are wonderfully enticing - it reminds me of exploring French marchés in the days leading up to Christmas. With mugs stacked to the side, you can leave a pot on the stovetop for an entire party as guests serve themselves. In the very unlikely event that you have mulled wine leftover, it keeps reasonably well in a jar or back in the wine bottle for a couple of weeks. Just reheat to enjoy again and again!
Cheers,
Jared