All aspects of the study were reviewed and approved by the Institutional Review Board at Hebrew Rehabilitation Center. The “VIBES” (Vibration to Improve Bone Density in Elderly Subjects) study is a randomized, double-blind, sham platform-controlled device trial of the efficacy of daily low magnitude, high frequency vibration to increase bone mineral density, balance, muscle mass and strength in men and women over the age of 60 who reside in independent living settings, and who have osteopenia of the hip or spine (between −1 and −2.5 standard deviations below young normal reference values) at baseline testing. The sample and the inclusion and exclusion criteria have been described previously , but in general, participants had an average age of 82.3 years (range 65–103) and were generally cognitively intact, relatively well-educated, mostly Caucasian men and women living around the Boston area. Study participants were recruited from 15 independent living facilities and informed consent was obtained from all.
VIBES participants were asked to stand on a vibrating platform seven days a week, for ten minutes each day, for up to three years. The device delivered a vibration at 30Hz (cycles per second), 0.3g acceleration (where g is earth’s gravitational field, or 9.8 meters per second squared). This low magnitude is often described as a buzzing sensation, and is considered safe by ISO standards for up to 4 hours each day. The sham group stood on an identical device that did not deliver this same vibration. There was no scheduled time for treatment on any given day; participants were simply instructed to use the device once a day at a time convenient to them. Daily adherence to study treatment was tracked using both electronic as well as self-reported written logs. More specifically, each day when participants used the vibrating platforms, they were asked to indicate the date and time of their session in a paper log book. Their session time on the VIBES platform was also recorded by the machine itself using unique person-specific radiofrequency identification (RFID) card system.
Self-reported paper logs
Participants were asked to sign a pre-designed paper log book at the start of each daily session; the log book was located in the same room as the platform. Dates were listed in a column with a row of empty boxes for participants to initial under their name. On the first study day of platform use, participants were individually instructed by research staff to write their initials under their name on the proper date and to note the approximate time that they stepped onto the platform. Paper logs were collected bimonthly and manually double entered into a database over the first year of the clinical trial.
Each platform session was initiated by an assigned RFID card specific to the person and his or her assigned platform. Sham platforms used in this trial were initiated by a similar card but participants assigned to these platforms did not receive the prescribed low magnitude vibration. Each participant was given his or her own card to use and instructed to use only that RFID card. When the machine was activated, a record was created in the platform computer memory card containing the date and time, length of the session, and any interruption during the 10-minute session. Each platform’s memory device was periodically downloaded and sent to the data coordinating center.
Early in the study, while reviewing the RFID recorded data from several platforms, we observed the occurrence of multiple sessions by individuals on the same date; further, they were recorded as taking place at late hours during the night. Ultimately it was discovered that the odd hours and duplicate sessions could be accounted for by a platform clock malfunction in 10 of the 38 platforms at 4 of the 16 study sites and that a simple subtraction of 12 hours corrected the problem. It should be noted that paper logs were used to confirm the 12-hour clock error. Two additional platforms reported similar problems but a 12 hour correction did not appear to be the explanation; instead, comparison with the paper logs indicated a discrepancy such that one clock time was ahead by 4.5 hours and the other by 5 hours. A total of 44 out of 136 participants were affected by these adjustments. These errors were corrected and the corrected data for these observations were retained in the analyses, as we felt the corrected clock errors would not invalidate our ability to compare the two methods.
In a few additional cases we discovered that duplicate sessions recorded by the devices resulted from RFID card sharing between participants or due to a participant attempting to do two sessions in one day after missing the previous day. Unlike the correction of the clock errors before analysis (noted above), these observations were not corrected before analysis because they were felt to be due to the behavior of the individual participants.
Age was obtained by interview, and height and weight were measured using a calibrated scale and stadiometer at baseline. Cognitive status was assessed using the Short Blessed Test (SBT)  during the consenting process. Time in study was calculated as the number of days from the first time that a participant logged in or initiated an electronically recorded session until the last day the same participant recorded a session by either method for up to one year of participation. In some instances the last day occurred because the participant dropped out (N=2), or because an individual participant had accumulated less than a year of follow up when the data were locked for analysis (N=56). a A health problems list was used to query whether a participant had been told by a physician in the past year that he/she has the listed disease or chronic health problem.
Adherence data were included from participants who had been in the study between 6 and 12 months. Adherence for both the self-reported paper logs and electronically monitored methods was calculated by dividing the number of completed sessions by the number of sessions expected as per protocol. The protocol specified that 10 minute sessions should occur daily; however, since the paper logs did not include the length of treatment, adherence recorded by the devices was defined as the number of sessions initiated relative to the number of possible sessions.
We used an intraclass correlation coefficient (ICC) to determine agreement between the two methods of adherence measurement. We used an ICC, rather than the standard paired intraclass correlation, because the data were organized by adherence method, rather than paired measures by individual, and because we were primarily interested in the similarity of the measured adherence between methods. ICC was calculated via two-way mixed effects modelling, treating subject as a random effect and method of adherence as a fixed effect with a consistency definition as described by McGraw and Wong. 
A priori we hypothesized that the agreement between self-reported and electronically monitored adherence recording might differ by sex, age, number of months in the study, and cognition. Therefore, we stratified the ICC analyses by age (less than versus greater than or equal to the median), sex, cognition, based on Short Blessed Test scores (less than versus greater than or equal to the median), and time in study. Expecting that adherence might change over time, we divided the time in study into several groups; the minimum required six months participation time to less than nine months, nine months to less than twelve months, and twelve months. We used SAS Version 9.2 (SAS Institute, Cary, NC) for all analyses.
To examine individual level data, we plotted electronic adherence versus self reported adherence using a Bland-Altman plot. This plot provided an indication as to whether the difference (discrepancy) between the two measures of adherence was related to the level of adherence. (i.e., there is a systematic bias in the measurements) or the difference was randomly distributed across the range of adherence (i.e., measurements are unbiased). Individual observations outside of the two standard deviation limits are indicative of particularly large discrepancies.