Tuesday, February 12, 2013
Present: Abby Crocker, Kairn Kelley, Amanda Kennedy, Rodger Kessler, Ben Littenberg, Charlie MacLean, Connie van Eeghen
Guest: Steve Kappel
1. Start Up: Introductions of mutually excited data hounds to each other, of which some have the start of a research question (e.g. use of opiate medications) and others have overlapping research interests and questions. Some also have an interest in using large data sets for education, QI interventions, and opportunities for prospective provider interventions.
2. Presentation: Steve Kappel: Understanding/using VHCURES: The Kingdom of Messy Data
a. We know the data base is an excellent source for paid pharmaceutical by all payors. Some of key considerations:
i. A claim is a small “chunk” of clinical info wrapped inside financial data: who received the service, the service, who paid for it, and some data about the patient
ii. OnPoint is the vendor that assembles the data and tries to identify patients/providers consistently – they are good, not great, at this. The payers individually create the identifiers; this leads to a lot of variability among payers.
1. Refreshed quarterly and provided directly by OnPoint. Lag: paid claims are posted by the end of the quarter. In general, data are up to date as of 6 months prior to the request date. Right now: data are up to date for all of 2011 and the first quarter of 2012.
2. Provider names vary greatly; NPIs are pretty good, although there are many for organizations and individual providers. Should be clean in the next few months.
3. Patient data include date of birth; Charlie is requesting access through IRB. Birth date, zip code, and gender generate very good matches everywhere except for Burlington.
iii. Connecting claims across patients is good, not great. No actual names and SSNs; these fields encrypted consistently but encryptions will vary if not exactly matched.
1. SSN is frequently missing; insurers are increasingly less willing to use (30% no SSN)
2. Referring provider information is not carried into the claim.
3. Prescribing provider is available, but not the clinical reason for prescribing.
4. Data base validation is needed: large scale chart review is being planned using PRISM clinical data and FAHC claims data (electronic to electronic) – which is limited form of validation.
5. Babies and mothers should be linked through subscriber information, as well as related claims data
b. Data history starts in 2007, for all claims from almost any insurer (85%) including TPA’s (self-insured), Medicaid, and out of state payers for Vermont resident beneficiaries. Medicare is in the process of being included: they have released primary care medical home claims (not to be released to anyone else). Should be completely available in one year.
i. Medicare: 65 and over, disabled children, ALS, ENRD – these are absent.
ii. Dual eligibility: can be identified, but no Medicare claims
iii. Non-Medicare: includes all covered expenses, except self-pay. Includes those covered by the deductible; does not include denied claims.
iv. Claims with very small dollar values are usually wrap-around (secondary) insurance coverage. Easy to flag the primary paid claim.
v. Claims with $0 value are those paid as part of deductibles.
vi. Claims with negative values are adjustments – complicated reworking of reversals and re-processing. These are separate transactions in BCBS; OnPoint bundles these together – which makes it hard to replicate data across time, as adjustments often occur in later quarters.
vii. Claims are also affected by what is covered: some diagnoses are paid more easily than others; this affects claims documentation
1. Example: it is hard to find diabetes on a medical claim – because this doesn’t affect the reimbursement. But the existence of the diabetes diagnoses affects the medical claims generated for the patient. This diagnosis must be inferred from other patterns that are evident from claims data (meds, tests, and procedures)
2. Can be used for comparative analyses: patients that appear to have diabetes and those that don’t, with the resulting differences in utilization and cost
c. Requests for data need to address these issues as “inclusion criteria,” with the additional requirement of a plan to link claims together
i. Clean requests: $ spent for an easy-to-find diagnosis on claims
ii. Less clean: $ spent for diagnosis recorded elsewhere
iii. Even less: Providers connect to patient for diagnosis
iv. Pharmaceuticals: can track the history of medication claims, although this is messy as insurance payers change within patient and excludes out of pocket expenditures
1. Example: we can look for people with a pain-related problem (like hip replacement), remove the patients with previous long term opiate use, and look forward to find subsequent use of opiates
2. Another: we can look for presentation to ED for musculo-skeletal injury, not already on narcotics for the previous 12 months; question is “how often do people ‘get stuck’ on opiates from a cold start?” This is similar to studying the incidence (not the prevalence) of chronic pain managed by opiates.
v. Exclusions can be organized at the personal level (not the claim level), in which markers from the claim identify the person (and all related claims) with that characteristic (e.g. diabetes identified by a specific medication). These are very explicit definitions (e.g. Boolean algorithms); the more the definition corresponds to the patient (rather than the claim), the cleaner.
vi. It is possible to include patients in the insurer data base who have not generated claims through the eligibility file, which includes all subscribers and beneficiaries (there is also a separate provider data base)
1. Every month of coverage is represented by a record for each patient in the eligibility data base
2. A break in the record indicates change in coverage
3. Markers for identifying changes: January and July of each year; milestone ages (65 and 26)
d. VHCURES studies have not been published yet; this makes for a good start to a FINER topic under any circumstances. Some caveats:
i. Cleaning the data will take a little more time. Good to start thinking about research questions now; requests could be planned as early as April 2013.
ii. IRB clearance is required
iii. Must bring a bag of cookies
e. A limited scope project to consider now: controlled substances prescribed by primary care providers could be used to look at new users of opiates (given that we don’t know how clean the patient MPI is). Next step: refresh the data set (a 97 second transaction).
i. Begin to analyze
ii. Run and compare with PRISM data – a source of validation, along with the FAHC warehouse
iii. Mom’s and babies: no methadone (given in clinics without a claim); covers all prescriptions; does not include medications provided during the hospital stay. However, if most moms are on Medicaid, DIVA might be a better source – or good to compare the two as another method of validation.
f. Candidate questions:
i. Methodological: can link babies and moms
1. Can we study their utilization
ii. Methodological: Can we find hospitalizations and match them
iii. Methodological: Can we find incidents leading to opiate use and track the natural history?
iv. Match to birth registry, DIVA, and DMV…
g. Next steps
i. Charlie to add everyone at CROW as key personnel study protocol with IRB
ii. Charlie to get refreshed data for his data set from Steve soon
iii. CROW to work on together – see below
h. Thank you Steve!
a. Feb 14: Abby: Breastfeeding manuscript (no Ben)
b. Feb 21: Kairn: F31 (no Amanda)
c. Feb 28: Rodger – PCORI (no Connie, no Kairn)
d. Mar 7: Connie: manuscript review (no Ben, no Kairn)
e. Mar 14: Charlie: VCHURES Opiate Data Mining (everyone will be here!)
f. Future agenda to consider:
i. Christina Cruz, 3rd year FM resident with questionnaire for mild serotonin withdrawal syndrome?
ii. Peter Callas or other faculty on multi-level modeling
iii. Charlie MacLean: demonstration of Tableau
Recorder: Connie van Eeghen
Posted by Connie at 2/12/2013 01:16:00 PM