Franz Kafka, meet Joseph Heller

Hold your Enemies Close and your Friends Closer In January 2015, Ricardo Alonso-Zaldivar and Jack Gillum of The Associated Press reported that the health insurance site Healthcare.gov had been sharing user data with companies like Google, Twitter and Facebook, as well as with a host of online advertising providers. They wrote that the administration said it had prohibited companies “from using the data to further their own business interests” and that “there is no evidence that personal information has been misused.” However, Cooper Quintin at the Electronic Frontier Foundation, a civil liberties group, wrote that “sending such personal information raises significant privacy concerns.” A company that receives the information, he added, “could match up the personal data provided by Healthcare.gov with an already extensive trove of information” to create an extremely detailed profile of you and your interests. Moreover, he wrote, a company could connect Healthcare.gov data with users’ real identities.

In March 2015, Elizabeth Dwoskin wrote an article in the Wall Street Journal aptly titled `The Next Marketing Frontier: Your Medical Records.` She disclosed that for physicians who utilize EHR software from Practice Fusion, when the physician views patient charts on his or her computer, a sponsored alert sometimes pops up to indicate when a patient is due for vaccines (or particular treatment) for influenza or for hepatitis B, among other ailments. Practice Fusion, which gives its software free to doctors, is pioneering a new type of data-driven business, and has built a database of 100 million patient records. Practice Fusion has begun to sell sponsorships for alerts to drug companies, labs and insurance companies, matching preprogrammed alerts to patients in real time based on their health indicators and medical history, letting marketers deliver a crucial pitch at the moment when clinical decisions are being made. Some experts worry that the sponsored alerts blur the line between promoting health and marketing medicines. Practice Fusion, which has raised $157.5 million from investors, says about 112,000 health professionals, doctors and nurses are using its system and the software logs about 5.5 million office visits a month.

Just this past December, Rebecca Robbins reported in the Boston Globe on some of the new data mining techniques by insurers, in a bid to figure out when you’re likely to get sick, ostensibly to design interventions to keep you healthy (and to save themselves a lot of money in the process). Insurance companies are now paying data analytics companies such as GNS Healthcare and Predilytics to sift through huge quantities of medical records, genetic information, and personal information on everything from what model car you drive to how many hours you sleep, from which magazines you read to where you shop and what you buy. GNS will also rank patients by how much return on investment the insurer can expect if it targets them with particular interventions, such as sending a text message reminding them to refill a prescription or sending a nurse to their home for a checkup. According to Colin Hill, the chief executive of GNS, the algorithm also can tell the insurer not to waste time and money trying to get certain patients to take their pills — but to spend resources on other patients instead. But using an algorithm to determine how and when to intervene raises troubling risks, said Kirsten Martin, an assistant professor at George Washington University who studies business ethics and Big Data. Such analyses are only as good as the underlying data sources, which in numerous instances have exhibited profound inaccuracies, as well as the algorithm used to mine them. Insurers say they don’t deny care to anyone based on algorithms, but just use the data to customize the approach to each patient. Yet surely there are big vested, commercial incentives by insurers to `monetize` this information, either in rate increases, added constraints or denials. And as always, I worry that insurers are using all this highly personal, often sensitive, possibly inaccurate information without informed consent and with little transparency or accountability.

Finally, IBM just announced the $2.6 billion purchase of Truven Health Analytics, which has data on the cost and treatment of more than 200 million patients. IBM is looking to enhance the growth of its Watson Health business, and to that end, has now purchased four companies since it created the unit last April, at a total expenditure of more than $4 billion. Two other acquisitions, Explorys, a spinoff from the Cleveland Clinic, and Phytel, a maker of software to manage patient care based in Dallas, also brought with them significant data assets, mostly data from patients’ electronic medical records. The Watson Health business, IBM said, now has health-related data on “approximately 300 million patient lives,” mostly in the United States. The goal is to run the patient data through Watson’s artificial intelligence (A.I.) software, so that it works as a specialized digital assistant to physicians and health administrators to improve care and curb costs. The $1 billion purchase of Merge Healthcare, a medical-imaging software company, added expertise in managing health image data. Truven contributes vital payment information on patients, including detailed coding on disease types, diagnosis, drugs prescribed. Now I am optimistic that the vast majority of the IBM researchers are interested in the scientific and A.I. opportunities in this project. But look at the dollar values involved here. More crucially, who has allowed the intermediary companies here to obtain and aggregate our medical and health records in such volumes, with such specificity, and trade them like stocks and bonds? Again, I have very serious concerns about data protection, anonymity, and sales of these data to other companies (such as insurers or marketers) with more mercenary or insidious interests. The frank and large scale activity here is in wanton disregard of our privacy rights, especially given the extent of data hacking, the special value of medical data, and the lack of anonymization described below. I could understand the handing off of records between subsystems strictly within a highly secure, closed network, with no outside commercial forces in play. But the present context just described appears to be light years away from such a place.

Anonymization with Plausible Deniability Even when real names and other personal information are deleted from large data sets, it is often possible to use just a few pieces of information to identify a specific person, according to a study published last year in the journal Science. A group of data scientists from the M.I.T. Media Lab analyzed credit card transactions made by 1.1 million people over a three-month period. Although the information had been `anonymized` by removing personal details like names and account numbers, knowing just four random pieces of “metadata” information was enough to uniquely re-identify 90 percent of the individuals. The study certainly calls into question the standard methods many companies and systems currently use to anonymize their records. As the authors wrote: “A data set’s lack of names, home addresses, phone numbers or other obvious identifiers does not make it anonymous nor safe to release to the public and to third parties.” In a 2013 study, Latanya Sweeney similarly demonstrated that researchers were able to re-identify patients by name in a supposedly anonymized hospitalization data set. Frank Pasquale, a law professor at the University of Maryland, has written an important book on the dark side of hidden algorithms, automated judgments and one-way mirrors (corporations watching individuals), entitled The Black Box Society: The Secret Algorithms That Control Money and Information, in which he discusses this issue within a larger context. As he says, we should not necessarily be reassured: “There’s a big literature out there on broken promises of anonymization, of efforts where users were assured that the information was anonymized, but it wasn’t really anonymized well.” Pasquale is very concerned about “the spillage of data from one context into others,” especially commenting that “there’s high demand for health data out there.” Life insurance companies, for instance, “want to use everything on you to calculate what your life insurance premium should be.” Hmm – I think that this links up rather naturally to Rebecca Robbins’ report in the Boston Globe on data mining by insurers. Should we be concerned?

Vulnerability with a King-size “V” As we now know, a double-edged byproduct of EHRs has been the loss of patient privacy and the security of personal health information, which of course is in profound contrast to our old-fashioned paper charts that were previously stored in an office or hospital basement. We all appreciate the potentially great advantages that computerization can afford, but much more protection is imperative along this front. Many disclosures within the last couple of years underscore the extent of the concern here, that we truly live an age of acute cyber vulnerability. The long list of both private companies and government organizations that have been hit include Target (70 million), Home Depot (50 million), the health insurer Anthem (80 million), Premera Blue Cross (11 million), and the U.S. Office of Personnel Management (OPM), 4 million federal employees. In many of these breaches, particularly those involving health care data, the stolen files include `huge treasure troves of personal data,` to borrow the phrase used by a Washington Post article last year to characterize the OPM breach. It turns out that in many of the breaches, the affected organizations had failed to take even basic steps to secure its computer networks. This has at times been attributed to `a lack of management focus on the potential problems`. This really means that the organizations did not want to budget funds or time to provide proper protection because profit margins would be lowered, and/or the cost of products might have to be raised slightly, placing them at a `competitive disadvantage`.

Some additional numbers worry me even more. Security experts have warned that further attacks on health care organizations were likely because of the especially high value of medical data on the black market. In black market auctions, complete patient medical records tend to sell at much higher prices than credit card numbers. One security expert said that at one auction credit card records were sold for 33 cents, whereas patient medical records sold for $251, a factor of nearly 1000 times higher. In another somewhat more cautious estimate, law-enforcement officials gave estimates of credit card numbers sold at $6 or $7 versus health care records sold at about $50, only a tenfold increase. A study published last year in the Journal of the American Medical Association found that between 2009 and 2013, more than 29 million medical records were hacked, stolen or otherwise compromised. Chillingly, about 90 percent of health care organizations reported they have had at least one data breach over the last two years, according to a survey of health care providers published last year by the Ponemon Institute, a research concern.

These `bobbles` have real consequence, oftentimes in the form of medical identity theft. According to a survey published last February by Ponemon, such theft affected 2.3 million adult patients in 2014. This could lead to loss of health insurance, collection notices from hospitals, and diminished credit scores. In a twist on identity theft, crooks could then use stolen personal data to get their own health care, prescriptions and medical equipment, which could lead to the thief’s health data folded into the victim’s own medical charts. Confusion or errors could ensue that could lead to dangerous diagnoses or treatments. Finally, adding insult to injury, a victim often could not fully examine or repair his own records because the thief’s health data, now folded into his, would be protected by federal medical-privacy laws.

I know – just lots of big numbers. Until you or a loved one gets hit, that is.