This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.
I spent part of last week listening to the panel discussions at CogX, the London “festival of A.I. and emerging technology” that takes place each June. This year, due to Covid-19, the event took place completely online. (For more about how CogX pulled that off, look here.)
There were tons of interesting talks on A.I.-related topics. If you didn’t catch any of it, I’d urge you to look up the agenda and try to find recordings of the sessions on YouTube.
One of the most interesting sessions I tuned into was on privacy-preserving machine learning. This is becoming a hot topic, particularly in healthcare, and especially now due to the interest in applying machine learning to healthcare records that the coronavirus pandemic is helping to accelerate.
Currently, the solution to preserving patient privacy in most datasets used for healthcare A.I. is to anonymize the data: In other words, personal identifying information such as names, addresses, phone numbers, and social security numbers is simply stripped out of the dataset before it is fed to the A.I. algorithm. Anonymization is also the standard in other industries, especially those that are heavily regulated, such as finance and insurance.
But researchers have shown that this kind of anonymization doesn’t guarantee privacy: There are often other fields in data, such as location, age, or occupation, that might allow you to re-identify an individual, especially if you are able to cross-reference it with another dataset that does include personal information.
Privacy-preserving machine learning, by contrast, promises much more security—in fact, most methods offer mathematical certainty that the individual records cannot be re-identified by the person training or running the A.I. algorithm. But it’s got advantages and disadvantages. (One of the big disadvantages is that some privacy preserving methods are less accurate. Another is that some privacy preserving methods require more computing power or take longer to run.)
Last week, Eric Topol, the cardiologist who is both a huge believer in the potential for A.I. to transform healthcare and a notable skeptic of the hype so far about A.I. in healthcare, took to Twitter to highlight a paper published in Nature on the potential use of federated learning, a privacy-preserving machine learning technique, to build much larger and better-quality datasets of medical images for A.I. applications.
As the CogX panelists noted, the ability to draw insights from large datasets without compromising critical personal information is of potential interest far beyond healthcare: It could help industries create better benchmarks without compromising competitive information, or help companies serve their customers better without having to collect and store vast amounts of personal information about them.
Blaise Thomson, who is the founder and chief executive officer of Bitfount, a company creating software to enable this kind of insight-sharing between companies (and who sold his previous company to Apple), went so far as to say that privacy-preserving A.I. could strike a blow against monopolies. It could, he argued, help reverse A.I.’s tendency to reinforce winner-takes-all markets, where the largest company has access to more data, cementing its market leadership. (He didn’t mention any names, but ahem, Google, and, cough, Facebook.) Thomson is a fan of a privacy-preserving method called multi-party computation, where random noise is added to data before it is used to train an A.I. algorithm.
M.M. Hassan Mahmoud, the senior A.I. and machine-learning technologist at the U.K.’s Digital Catapult, a government-backed organization that helps startups, explained federated learning. It functions as a network, where each node retains all its own data locally and uses that data to train a local A.I. model. Aspects of each local model are shared with a central server, which uses the information to build a better, global model that is then promulgated back down to the nodes.
The problem: Coordinating all that information sharing requires specialized software platforms, and right now, the different software systems for running federated learning from different vendors (Google has one, graphics chip giant Nvidia has one, and China’s WeBank has another) are not compatible. So Mahmoud’s team built, as a proof of concept, a federated-learning system that could function across all these softwares. “It’s a great time to, as a community, build a common, open, scalable core that can be trusted by everyone,” Mahmoud said.
The final panelist was Oliver Smith, the strategy director and head of ethics for Health Moonshot at Telefonica Innovation Alpha. That’s a branch of Telefonica, the Spanish telecommunications firm, that works on transformative digital projects, including, in this case, mobile apps to help people with mental health. Smith said his group had investigated six different techniques for implementing privacy-preserving A.I. “My dream that we could take one technology and apply it to all of our uses cases is not really right,” he concluded. Instead, each use case was probably best suited to a different technique.
But Smith was clear about the potential of the whole field: “All of these techniques hold the promise of being able to mathematically prove privacy,” he said. “This is much better than anonymization and that is where we need to get to.”
It’s clearly a trend that anyone implementing an A.I. system —especially one that deals with personal information—ought to be thinking hard about.
With that, here’s the rest of this week’s A.I. news.