“During a gold rush, sell shovels.”
The famous California Gold Rush began in January 1848, when James W. Marshall discovered gold at Sutter’s Mill in California. As news spread the following year, roughly 300,000 “49ers” from across the world made the mad dash to California to stake their claim to riches. Today, we’re experiencing a modern AI gold rush of potentially exponential proportions with the onset of generally available AI. How can today’s pioneers ensure they get their share of the windfall? While AI is new, we can learn from the past: during a gold rush, sell shovels. In other words, in an AI gold rush, monetize data.
While Artificial Intelligence (AI) and Machine Learning (ML) have been buzzwords in the tech space for the last few years, since ChatGPT launched in November 2022, “AI” has had the same electrifying effect as the news of gold from California. ChatGPT opened the floodgates by democratizing access to large language models (LLM). That moment sparked an “AI gold rush” wherein every Board, VC, and business leader instantly became desperate not to be left behind. The seemingly obvious path to success is to participate directly in the rush: to stake a claim and start digging in hopes of uncovering a means to grow the business through better insights or streamlined operations. However, plenty of organizations aren’t actively in the mountains digging for gold. The costly mistake would be thinking the only way to benefit from the upheaval is to drop everything and run for the hills.
As prospectors rushed into California in the late 1840s, wily entrepreneur Sam Brannan sensed an opportunity of his own. He spread the news of the gold discovery throughout San Francisco to feed the frenzy—and drive sales to his general store in Sutter’s Mill, where he happened to sell all the equipment and supplies a new-to-town prospector might need to tackle searching for gold: shovels, pick axes, and more.
The shovel equivalent for today’s AI gold rush? Data. AI needs a ton of data—a corpus of diverse data available to develop and train models to produce trustworthy results. Until now, many companies have acquired their data from enterprise data aggregators. These companies collect and process data from many sources, with a distribution model that relies on the sale of massive datasets. An obviously necessary service, their reach has been limited by the need to navigate the growing complexity of data security, privacy, and protection regulations for all regions and markets in which they operate. And today, they’re no longer the only ones with a valuable data set to offer.
In Spring 2023, we wrote about the data-acquisition initiatives seen in healthcare, and a recent article published by The Economist provides examples and discusses some opportunities and challenges. Every organization has data that could provide critical insight to another organization. Think of how grocery stores offer discounts when you use your membership card or the benefits of using your frequent flyer number with an airline. While those programs were initially created to engage customer loyalty with perks, today, there’s gold in those data stores. But it’s not just massive enterprises—small and medium businesses also have an opportunity to monetize their data. For any organization, the trick will be extracting and sharing it safely and cost-effectively.
Ensure Data Quality
The first step to data monetization is data preparation for data quality. It may be simplest to start with structured data already stored in typical databases or spreadsheets formatted in standard rows and tables with clearly defined attributes. This data is the easiest for ML and AI to use, and it’s also easier to clean via automated removal of duplicates, irrelevant data, and outliers.
Unstructured data is information in formats such as customer comment text fields, social media posts, and even images or videos. The benefit and curse of unstructured data is that massive amounts of it can be stored in unexpected places across the organization. It can provide extraordinary insight, but it’s also far more challenging to clean and prepare for sharing. You can automate some of this work with parsing, normalization, and deduplicating, but some manual review may still be involved.
Understand Data Regulations and Regional Restrictions
When scoping your data monetization project, you’ll need to understand which data you hold would be valuable to which target market. Key to this market definition is comprehending the relevant data regulations and restrictions. For example, if you happen to hold personal health information (PHI), it could be priceless to a consumer medical device company, but HIPAA defines where, why, and how you can share that data. If you hold personally identifiable information (PII) for citizens of the EU, you’ll need to ensure compliance with GDPR requirements for data privacy and security.
Utilize Privacy-Enabled Services
In the past, the barriers to sharing this information safely and according to relevant regulations might have been so high that only the largest data aggregators mentioned above could meet them (while also imposing a high TCO for the customer). However, modern privacy-enhancing technologies (PETs) that allow sensitive data to be computed on without revealing the underlying data can make secure data distribution much more accessible. Duality’s platform, for example, enables organizations to leverage sensitive data to train machine learning models or run inference on decentralized data without revealing the native sensitive information, PII, or IP.
These privacy enhanced solutions (using PETs) can expand small and mid-size companies’ total addressable market by allowing them to offer data access more easily and cost-effectively in regions and industries with restrictive data privacy regulations. These technologies can unlock new sales models for corporate aggregators with lower price points and lower overhead (TCO) for customers. As they don’t require entire (expensive and sensitive) datasets to be transferred to the customer (most common when engaging with government agencies), the customer is no longer forced to take on the debt of data ownership – from IT to security to privacy and compliance.
As mentioned above, government agencies conducting sensitive investigations are often faced with additional requirements when it comes to maintaining the confidentiality of the target of investigations when utilizing commercially available datasets or those from private organizations. This means investigators are not permitted to use web interfaces to query large datasets as typically provided by various data brokers. Instead, the agency must have the humand and monetary resources to purchase, store, and protect that data–adding unnecessary data risk to the agency. A privacy enhanced solution like Duality’s eliminates this burden. Software is deployed on both sides (analyzer/investigator and the data broker) to provide quantum-safe guardrails (fully homomorphic encryption) guaranteeing the privacy and confidentiality of the data, the subject of the queries, and the results. This allows data brokers to not only acquire more contracts, but to expand them given the satisfaction of data localization requirements.
In order to generate new revenue streams in the modern AI gold rush, look to the data you already pay to collect, store, use, and protect. Unlocking this value is more manageable than even a few years ago with today’s privacy-enabled platforms. Such services turn a liability and cost center into a profit center. They can significantly expand existing operations, maximizing profits while increasing your addressable market by making compliance and security more straightforward and more cost-effective than ever before.
Want to understand how to turn your data into an income-producing asset or expand the revenue stream you already have with modern privacy-enhancing technologies?
Contact us to see how your organization can accelerate and improve data monetization efforts.