Balancing Insight and Privacy

With recent advances in human health and genetics, people have unprecedented access to information about their health risks and possible interventions. Computing these diagnoses and recommendations requires data spanning genetics, physiology, and lifestyle but most of us rightly think twice about sharing sensitive information. At first sight this seems like an insurmountable problem – how to balance better medical care with privacy? Recent advances in computer science, cryptography, and security research offer a way forward.  

Imagine you are a doctor who studies obesity and heart attack risk. One of the first things you might like to know are age demographics across particular regions and countries. You could simply try to ask several million people for their date of birth but most people would be reluctant to tell you. Among other uses, banks use this information to verify a person's identity. Would you give your date of birth to someone who just contacted you by email? Even if you trusted that person, you might still be inclined to fudge and round down a few years, since most people prefer to be younger.

Here is another way – the doctor could ask everyone to add a random number between -100 to +100 to their age and submit the result. So if your age is 27 and your choice of random number is -12, you would subtract 12 from 27 and send the result, 15, to the doctor. The doctor would then not know your age, but she (or he) can still determine something very useful, which is the distribution of ages within the population. To do this, the doctor would average over many responses. Similar to how the mean displacement of a diffusing piece of dust is zero, the random terms would tend to cancel, gradually revealing a good estimate of the population’s true age structure.

This approach works well for research but there is an obvious problem – in a typical medical context, a patient expects their doctor to tell them something about their health and not just make abstract statements about global demographics.  

Let’s add a few twists to the above approach. Here is a simplified example of secure multiparty computation. Imagine a doctor has developed a new algorithm for estimating cardiac risk. Let’s say it’s a simple addition, such as risk = age + 3. The patient, Alice, may wish to keep her age (= 27) secret and the doctor may wish to protect her algorithm (risk = age + 3). As shown in the figure, these two parties could agree to do the following. First, they could both subtract a random number from their secret (Step 1) and then share the differences (Step 2). Next, they could each add their random number to the other party’s share (Step 3) and then exchange the results of those calculations (Step 4). Finally, one (or both) parties could add the shares, revealing the patient’s risk (= 30) without, at any point, transmitting either the patient’s age or the doctor’s secret.

Mathematically, how does this work? If you kept track of all the terms, you see that risk = (age – rp) + rd + (wd -rd) + rp = age + (rprp) + (rd - rd)+ wd. The middle two terms evaluate to zero, simplifying the expression to the desired result, risk = age + wd. Essentially, both parties inject noise into a communications protocol, the noisy signal undergoes previously agreed-upon linear mathematical operations, and then, the noise is removed at the very end. The fact that both parties remember and do not disclose or share their noise terms (rp and rd, respectively) is what protects the secrets. 

This simplified example omits details required to make such systems work in practice – most obviously, in this toy scheme, the patient can immediately obtain the doctor’s secret, wd, by inverting the algorithm (risk – 27 = 3). As soon as the secrets on both sides become more complex, it becomes exceedingly difficult for either party to learn (or even estimate) the other party’s secrets. Multiplication requires additional steps such as mathematically-paired random numbers called Beaver Triples. If you have read to this point, you might be interested to learn more, or, to come join us in Palo Alto or Hong Kong to build a secure foundation for digital health. In the former case, here is the original 1991 publication on Beaver Triples (Efficient Multiparty Protocols Using Circuit Randomization) and in the latter case, additionally, reach out.

Back Top