Google’s Window Into The Healthcare IT Market

by

Market Analyst, Software Advice

One of the major goals of the federal government’s push for nationwide electronic medical record (EMR) adoption is to create an information network where “health data can flow freely, privately, and securely to the places where they are needed.” So far, this is proving to be a challenge for the nation’s hospitals and doctors.

Software Advice thinks that this problem presents an opportunity for Google to take a big step into the healthcare IT market in 2010, following other major companies like Microsoft, I.B.M. and insurance giant Aetna. Through their Books project, Google has shown that they can scan, interpret and index a high volume of books in a relatively short amount of time. Unstructured medical records – those not neatly organized within an interoperable EMR system – could be managed in the same fashion. Google possesses many of the requisite skills and technologies to solve this problem.

However, to be successful, Google will have to figure out these issues:

  • How to gather structured and unstructured medical data on a large scale;
  • How to share and make that data accessible (searchable) to people; and,
  • How to comply with privacy regulations.

With Google Health rumored to be on the back burner, working with hospitals and medical providers to aggregate and organize medical data could be Google’s window into the growing market that is healthcare IT. Here's how they can do it.

EHR tablet

The Benefits of Digital, Private & Secure Health Data
The driving force behind the government’s $19 billion EMR incentive program is that medical record software truly can transform the United States’ healthcare system for the better. EMR advocates have long touted the software’s ability to reduce medical errors, improve clinical decision making, empower patients, and reduce the costs of a bloated system.

When medical data is in digital form, it can be sorted, searched and analyzed at a higher rate of efficiency than paper charts. When implemented correctly, EMR software beats paper charts in efficiency, accuracy and cost savings. The problem that Google can possibly fix is the fact that a majority of health data in the U.S., both historical and current, is in paper form.

Structured & Unstructured Data
Medical data comes in essentially two forms: structured and unstructured. Structured data is information that comes in numbers, tables and rows, for example. It’s data that is disciplined and predictable. In the medical world, examples of structured data include insurance codes, HL7 standards and other diagnosis codes. Structured data, relative to unstructured data, is easier to aggregate and analyze.

For example, if a user needs to connect two systems operating in two different structured data formats, a “middleware” application is an option. Middleware sits “in the middle” of two different operating systems, allowing them to share information. There are a number of companies in the health IT marketplace today that connect disparate data systems via middleware.

Gathering unstructured data and turning it into a structured format, however, is not so easy. Unstructured medical data includes handwritten notes and charts, and medical images such as x-rays and CT scans. This data can be further categorized as textual unstructured data and non-textual unstructured data, respectively.

Currently, medical transcriptionists and document scanning services use a combination of human review and optical character recognition (OCR) to produce structured data out of unstructured EMR information. This method is expensive and time consuming to say the least.

How To Gather & Store This Data
So, how can Google go about turning unstructured data into structured data on a large scale?

In the case of textual unstructured data, Google’s reCAPTCHA program could be the answer to converting it into a structured format. CAPTCHA programs, boxes that ask a user to identify distorted words in order to proceed past a certain point, are becoming ubiquitous on the web as a way to fight spam. Google uses their reCAPTCHA program to translate books, old radio shows and newspaper articles by asking users to identify one word already known by Google and one previously unknown word. The unknown words come from a list of words that OCR programs were unable to translate. If a user gets the known word correctly, for example “overlooks” in the image above, it will assume that what the user types in for “inquiry” will be correct. The unknown word will continue to be shown to other users, to increase reCAPTCHA’s confidence that the translation is correct.

If Google is doing this with books and newspapers, why not with handwritten medical charts and notes? The same logic applies – scan and upload individual words from handwritten medical data to a CAPTCHA program, let humans translate them over the web and over time textual unstructured data becomes structured data. Google could theoretically let the 200 million CAPTCHAs filled out each day on the web work towards translating medical records.

Perhaps the most impressive fact about reCAPTCHA is that its accuracy rate is 99.5%, which is the equivalent of a human translation. It’s not a stretch of the imagination to envision a system where medical providers can upload their paper documents and have them translated by Internet users.

Finally, Google is well-suited for this project because of the huge amount of digital storage space they have in their 30+ data centers around the world. Hosting this data in the cloud and storing it on super efficient servers means doctors could access a patient’s EMR more quickly than if the data was stored locally. We’ll touch on privacy issues in just a moment of storing medical data in the “cloud” in a moment.

Making Medical Data Usable
Let’s assume that Google can use their reCAPTCHA program to over time translate unstructured medical records, in addition to collecting structured data through specifications such as the Continuity of Care (CCR) and Continuity of Care Document (CCD). How do they make that information easily accessible by humans?

Part of the answer lies in Boston, MA. A team of researchers at Massachusetts General Hospital have created a system that pulls medical data from different sources within the hospital’s electronic medical record software and presents it in a logical and user-friendly format. It’s called the Queriable Patient Inference Dossier (QPID). Here’s how it works:

While Google's PageRank system works by giving more weight to pages that are linked to more often, EMRs don't have links and therefore cannot employ that approach. Instead, the dossier system has the ability to "learn" certain types of searches from its users, understanding that a search for "squamous cell carcinoma" and another search for "lung cancer" are actually seeking the same information.

The QPID system uses natural language processing (NLP) to “learn” the relationships between words. Sophisticated NLP tools, often associated with artificial intelligence, allow a computer to read and interpret text as if it was human. In short, they use complex statistical models to predict the correct spelling and order of words in a sentence.

Google just recently announced they were ceasing development of their Google Wave project, which uses NLP tools as part of its spell check system. Google’s NLP tools are particularly effective because they are developed using data from billions of Google web searches. This makes Google’s language and statistical models particularly powerful across a number of languages. Also, in a bit of an odd twist, two Google researchers are set to release a white paper about using Google Wave’s protocol to aggregate medical data.

So, if Google work with hospitals and other medical providers to translate handwritten medical documents, combine those with structured medical data, and apply their powerful NLP tools, they could end up with much more robust QPID program than the Massachusetts General team created.

Complying With Privacy Regulations
The Health Insurance Portability and Accountability Act (HIPAA) is the United States’ guiding document when it comes to safeguarding personal health information (PHI). The 1996 piece of legislation requires any “covered entity” who manages personal health information to have administrative, technical and physical safeguards against a breach of data. A covered entity is defined as:

  • A health care provider that conducts certain transactions in electronic form;
  • A health care clearinghouse; or,
  • A health plan.

Google Health, the company's personal health record project, allows consumers to add their health information to a digital record online, import prescription information from pharmacies and share that record with their doctor. Currently, Google argues that they're not covered by HIPAA because they're essentially acting as a free online repository, and not transmitting health information electronically themselves.

If Google were to start organizing medical records in the fashion we've described, they would have to conform to HIPAA standards. With dozens of Web-based EMR vendors, who store medical records online, already successfully complying with HIPAA, we don't feel that compliance would present a major issue for Google.

Fulfilling Google's Mission Statement
Gathering the United States’ medical data and making it digitally accessible would be perhaps the greatest fulfillment of Google’s mission statement – “To organize the world's information and make it universally accessible and useful.”

The tools are in place to make it happen. Google has shown they have the will to take on a project of this size. The Google brass will have to decide if the benefits outweigh the costs of a digital healthcare system.

Image in blog post originally created by MC4 Army.

 
  • http://www.neuralware.org Neuralware

    You name some very interesting concepts. But as far as the reCaptcha is concerned, I don’t think this concept will work as well for medical data. For two reasons: an MD’s handwriting is almost unreadable most of the time, especially for non-MD’s. Furthermore, an average person will not know, and thus not recognize all medical terms. This will result in much lower recognition rates.

    My two cents :)

  • http://www.practicevelocity.com David Stern

    This post contains quite a few interesting ideas. Practice Velocity, LLC (www.practicevelocity.com) has been a pioneer in utilizing technology to gather information from a scanned paper templates for urgent care physicians. The patented technology in PiVoT allows the computer system to read fields on a paper template to collect coding, provider and quality assurance data. It is a much bigger step, however, The problem with using any technology to “manufacture” an official medical record is that it will require a physician to review and sign the record before it can be a legal document. Another problem with this concept is the fact that physician handwriting is notoriously bad. In my experience, many times the physician can not read his or her own handwriting. We have not been able to find an OCR program that will be able to get anywhere near 99% accuracy with some physicians’ handwriting. The proposed reCAPTCHA concept is very intriguing method to improve OCR technologies. One has to wonder how well this would work with physician handwriting and technical medical words out of context. I would almost certainly improve results of current OCR technologies. Paper templates scanned into a computer system can be an effective method to transition a practice toward using an electronic medical record. We believe, however, that moving physicians toward direct documentation on a computer will be the long-term requirement for effective use of computers for medical record documentation. In addition, the federal government through the misnamed American Recovery and Reinvestment Act is forcing physicians (initially through a handout of up to $60,000 per physician and later through financial penalties) to implement electronic medical records that meet specific design requirements, requirements that preclude the envisioned approach. The initial regulations released by the federal government are interpreted by most industry insiders as strongly favoring Microsoft (over Google and other entities) as the platform provider for providing a single integrated medical record for each member of the US public. This may be one reason why Google has decided to focus its energies elsewhere.

  • http://blog.aperio.com Ole Eichhorn

    EMR represents a massive opportunity for companies like Google, absolutely.

    In addition to structured and unstructured data (text), there is a third kind of data: images. Managing images is a key part of any EMR, because images often represent the primary data physicians used to make a diagnosis and determine patient care. Some fields like Radiology and Cardiology routinely capture images and these are increasingly part of EMR systems. The field of Pathology (where more diagnosis is made than in any other) is now also using images; the advent of digital pathology is transforming that field. And integration with EMR systems is one of the drivers…

  • http://tcaruso2.blogspot.com/ Tom Caruso

    Thank you for this interesting idea. Google’s use of reCaptcha is quite creative.

    Even if you can read the handwriting or audio notes of MDs, once you have the unstructured text in a machine-readable form, you are faced with several other problems:

    1.) Maintaining the privacy of the data. Natural language processing will never be able to identify 100% of the private health information (PHI) in unstructured text, making any use of this information in unstructured format limited to consented use.

    2.) Relating one clinician’s unstructured text to another’s. Since no selection lists were used in forming the key words in the unstructured text, you can not related identical key words from two different authors.

  • David Nicholson

    Interesting article with some neat ways to make old paper charts accessible electronically.

    My primary concern stems from statements like this, “When implemented correctly, EMR software beats paper charts in efficiency, accuracy and cost savings.” There is scant evidence that any of the efforts underway now will produce those results, especially on a national scale of interoperability, which is a long way off.

    Many of the cool technological tools that exist today could be used in health data management. But unlike most solutions from the technology world, very few in health care are the ones writing the code or designing the systems, yet we are the ones who encounter the problems everyday. Take the new technology from Square as an example. A group of developers who actually buy stuff with credit cards, saw a problem and set about to solve it. They are uniquely positioned to solve the problem because they actually buy stuff with credit cards. So they developed a piece of hardware and the software to power it so credit card transactions can happen anywhere or through anyone who owns and iPhone. Not so in health care. It is like a constant barrage of other people telling us what or how to do what we do. Mostly, it’s not very helpful. So unless more inspired developers work in concert with providers, we will muddle through this process.

  • Blogs by Market:
  • Subscribe to the Software Advice Medical Blog

Popular Blog Posts