Safety concerns with consumer-facing mobile health applications and their consequences: a scoping review

Corresponding Author: Farah Magrabi, PhD, Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Level 6, 75 Talavera Road, North Ryde NSW 2113, Sydney, Australia; farah.magrabi@mq.edu.au

Search for other works by this author on:

Journal of the American Medical Informatics Association, Volume 27, Issue 2, February 2020, Pages 330–340, https://doi.org/10.1093/jamia/ocz175

10 October 2019 29 August 2019 Revision received: 05 September 2019 23 September 2019 10 October 2019

Cite

Saba Akbar, Enrico Coiera,, Farah Magrabi, Safety concerns with consumer-facing mobile health applications and their consequences: a scoping review, Journal of the American Medical Informatics Association, Volume 27, Issue 2, February 2020, Pages 330–340, https://doi.org/10.1093/jamia/ocz175

Navbar Search Filter Mobile Enter search term Search Navbar Search Filter Enter search term Search

Abstract

To summarize the research literature about safety concerns with consumer-facing health apps and their consequences.

Materials and Methods

We searched bibliographic databases including PubMed, Web of Science, Scopus, and Cochrane libraries from January 2013 to May 2019 for articles about health apps. Descriptive information about safety concerns and consequences were extracted and classified into natural categories. The review was conducted in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) statement.

Of the 74 studies identified, the majority were reviews of a single or a group of similar apps (n = 66, 89%), nearly half related to disease management (n = 34, 46%). A total of 80 safety concerns were identified, 67 related to the quality of information presented including incorrect or incomplete information, variation in content, and incorrect or inappropriate response to consumer needs. The remaining 13 related to app functionality including gaps in features, lack of validation for user input, delayed processing, failure to respond to health dangers, and faulty alarms. Of the 52 reports of actual or potential consequences, 5 had potential for patient harm. We also identified 66 reports about gaps in app development, including the lack of expert involvement, poor evidence base, and poor validation.

Conclusions

Safety of apps is an emerging public health issue. The available evidence shows that apps pose clinical risks to consumers. Involvement of consumers, regulators, and healthcare professionals in development and testing can improve quality. Additionally, mandatory reporting of safety concerns is needed to improve outcomes.

INTRODUCTION

Advancements in digital technologies have provided consumers with access to a wide range of resources to manage their health. Health apps, software programs that run on smartphones and other mobile communication devices, are an important example because they provide a variety of different ways to engage and empower consumers. The numbers of health apps have soared in the last few years, and by the end of 2017, there were almost 325 000 health apps available on the leading app stores. 1 Complementary to the rising number of apps, their demand is also growing. Approximately 3.8 billion apps were downloaded in 2017, which was a 16% increase from 2016. 1

Health apps provide a range of facilities from simple reminders and record-keeping diaries to complex medical devices. 2 They are accessible at all times and they let consumers manage chronic diseases such as diabetes, support lifestyle changes to aid weight loss and smoking cessation, and even promote self-diagnosis. 3 , 4 Many health apps utilize mobile phone features such as cameras and Bluetooth to allow users to record behavioral data such as activity and food intake. 5 Such applications make it easier for consumers to manage their health over time by setting goals and reminders. Apps typically use techniques such as text messaging, access to personal health records, and telemedicine or telehealth to engage with their consumers. 3 They also offer educational resources for consumers with varying degree of health and digital literacy.

While apps have the potential to benefit consumers by offering interactive tools that help with treatment adherence and by improving access to information, 6 they can also pose safety risks if they are inaccurate and unreliable, mainly because consumers may use the information from apps to make decisions about their health. 7 Recently, discussions about potential risks and consequences of health apps have increased, however, there is a lack of consolidated evidence in this area. While the existing literature examines the effectiveness of apps, 8 safety risks are generally discussed as part of other objectives such as when developing frameworks to assess apps 7 , 9 and reviewing regulatory implications. 10 Additionally, previous reviews have mainly focused on the quality of apps that target specific health conditions, such as diabetes 11 or asthma. 12 A limited number of studies take a broader look at the risks of using health apps. For example, one study that reviewed apps used by both consumers and providers suggested that apps can pose dangers to physical integrity, bodily well-being, mental well-being, and the privacy of consumers. 13 To the best of our knowledge, no study has specifically reviewed safety risks of consumer-facing apps. To address this gap, we conducted a scoping review to summarize the research literature about safety concerns with consumer-facing health apps and their consequences.

MATERIALS AND METHODS

We focused on studies reporting safety concerns with consumer-facing health apps intended for use primarily by specific patient groups or general populations. 14 Based on our previous studies, safety concerns were defined as problems with apps that posed actual or potential risks of harm to consumers. 15

Search strategy

Bibliographic databases including PubMed, Web of Science, Scopus, and Cochrane libraries were searched in June 2017 and updated in May 2019. The search query used was (“safety” OR “risk*” OR “error*” OR “concern*” OR “problem*” OR “challenge*” OR “failure” OR “quality”) AND (“app*” OR “application*”) AND (“smartphone” OR “mobile” OR “mHealth” OR “patient facing”). Appropriate vocabulary terms were included ( Supplementary Appendix A) and the retrieval set was limited to articles published in 2013 or later.

Study selection

The review was conducted in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta Analyses extension for Scoping Reviews). 16 After the initial search, duplicate entries and those that were either an erratum or response to another published article were removed ( Figure 1). The titles and abstracts of the remaining articles were screened by a single reviewer to identify relevant studies. Study designs were limited to analyses of apps, pilot tests, randomized controlled trials, and systematic reviews. Non-English articles and conference abstracts were excluded. Full-length articles were retrieved from all abstracts identified for inclusion and were assessed independently against the inclusion criteria by two reviewers (SA and FM). Articles that did not meet any of the inclusion criteria were excluded and any disagreements about inclusion or exclusion of an article were resolved by consensus.

Article search and retrieval process.

Article search and retrieval process.

Data extraction and synthesis

For each included study, descriptive information about the apps studied, study design, sample size, safety concerns, and consequences were extracted.

Type of apps

The type of apps was examined using the U.S. Food and Drug Administration (FDA) classification (high-risk mobile medical apps, low-risk mobile medical apps, nonmobile medical apps). The FDA regulates high-risk medical apps and exercises enforcement discretion for low-risk medical apps. 17

Health domains

Health domains addressed by apps were examined using the categories of consumer mobile health apps described by Kao and Liebovitz (wellness management, disease management, self-diagnosis, medication reminder, physical medicine and rehabilitation). 18

Consumer engagement strategy

Consumer engagement was examined using categories from the framework developed by Singh et al (providing educational information; reminding or alerting users; recording and tracking health information; displaying and summarizing health information; providing guidance based on information entered by the user; enabling communication with clinicians, family members, and caregivers; providing support through social networks; supporting behavior change through rewards). 19

Categorization of safety concerns

Using a grounded approach both reviewers iteratively examined descriptive information about safety concerns to identify natural categories relating to the quality apps themselves and the processes undertaken to develop them. 20

Consequences

  1. Potential or actual harm to a consumer: A safety concern that reached consumers (eg, an app yielded a wrong dose for insulin that the user followed and became hypoglycemic).
  2. An arrested or interrupted sequence or a near miss: A safety concern that was detected before reaching consumers (eg, a consumer recognized the incorrect recommendation in the app and did not follow it).
  3. Noticeable consequence but no harm: A concern that affected use of the app but no harm to consumers (eg, time wasted waiting for an app to function correctly).
  4. No noticeable consequence: A concern that did not directly affect safe use of the app (eg, consumers did not like the font used to present information in the app but did not stop using it).
  5. Hazardous event or circumstance: A concern that could potentially lead to an adverse event or a near miss (eg, an app underdiagnosed a malignant skin lesion, falsely reassuring consumers).

The two reviewers independently examined free-text event descriptions of safety concerns to assess consequences. Interrater reliability analysis using the kappa statistic was performed to determine consistency among reviewers. When reviewers disagreed, the report was re-examined and a consensus category assigned. The interrater reliability (kappa) for the classification was 0.92 (P < .001; 95% CI; 0.77-1.0). A narrative synthesis then integrated findings into descriptive summaries for each category of safety concerns.

RESULTS

Our search returned a total of 3456 titles and abstracts. After removal of duplicate entries and errata, 2388 abstracts were screened. Of these, 2314 studies were excluded for various reasons leaving 74 studies that provided reports about safety concerns of mobile health apps ( Figure 1).

Descriptive analysis of selected studies

Of the 74 studies included in our review, 17 (23%) were published in 2018 ( Table 1; Supplementary Appendix B). The majority were reviews of apps (n = 66, 89%), which evaluated the quality of content and functionality of either a single app or a group of apps targeting a specific audience, for example, apps for diabetes management. 3 The number of apps reviewed in these studies typically ranged between 1 and 756. Only 11 of these reviews examined apps in the hands of consumers. Four of the included studies were literature reviews about groups of apps, 21–23 3 were randomized controlled trials, 24–26 and 1 was a nonrandomized controlled trial 27

Characteristics of studies reporting safety risks of mobile health apps (N = 48)

Characteristics . n . % .
Study design
review of app(s)6689
literature review45
randomized controlled trial34
nonrandomized controlled trial11
Year of publication
201368
2014811
20151115
20161520
20171419
20181723
2019 (until May)34
Types of apps
High-risk medical device79
Low-risk medical device4257
Wellness2534
Consumer engagement functionalities a
provides educational information4966
reminds or alerts patient1926
tracks information5270
displays and summarizes user-entered information5473
provides guidance based on user-entered information4966
enables communication with family or clinician1419
provides support through social networks2027
rewards behavior change1419
Domain 18
disease management3446
wellness management2635
self-diagnosis68
medication reminder23
physical medicine and rehabilitation57
all health domains11
Characteristics . n . % .
Study design
review of app(s)6689
literature review45
randomized controlled trial34
nonrandomized controlled trial11
Year of publication
201368
2014811
20151115
20161520
20171419
20181723
2019 (until May)34
Types of apps
High-risk medical device79
Low-risk medical device4257
Wellness2534
Consumer engagement functionalities a
provides educational information4966
reminds or alerts patient1926
tracks information5270
displays and summarizes user-entered information5473
provides guidance based on user-entered information4966
enables communication with family or clinician1419
provides support through social networks2027
rewards behavior change1419
Domain 18
disease management3446
wellness management2635
self-diagnosis68
medication reminder23
physical medicine and rehabilitation57
all health domains11

a Categories are not mutually exclusive. One study may include apps using multiple engagement functionalities.

Characteristics of studies reporting safety risks of mobile health apps (N = 48)

Characteristics . n . % .
Study design
review of app(s)6689
literature review45
randomized controlled trial34
nonrandomized controlled trial11
Year of publication
201368
2014811
20151115
20161520
20171419
20181723
2019 (until May)34
Types of apps
High-risk medical device79
Low-risk medical device4257
Wellness2534
Consumer engagement functionalities a
provides educational information4966
reminds or alerts patient1926
tracks information5270
displays and summarizes user-entered information5473
provides guidance based on user-entered information4966
enables communication with family or clinician1419
provides support through social networks2027
rewards behavior change1419
Domain 18
disease management3446
wellness management2635
self-diagnosis68
medication reminder23
physical medicine and rehabilitation57
all health domains11
Characteristics . n . % .
Study design
review of app(s)6689
literature review45
randomized controlled trial34
nonrandomized controlled trial11
Year of publication
201368
2014811
20151115
20161520
20171419
20181723
2019 (until May)34
Types of apps
High-risk medical device79
Low-risk medical device4257
Wellness2534
Consumer engagement functionalities a
provides educational information4966
reminds or alerts patient1926
tracks information5270
displays and summarizes user-entered information5473
provides guidance based on user-entered information4966
enables communication with family or clinician1419
provides support through social networks2027
rewards behavior change1419
Domain 18
disease management3446
wellness management2635
self-diagnosis68
medication reminder23
physical medicine and rehabilitation57
all health domains11

a Categories are not mutually exclusive. One study may include apps using multiple engagement functionalities.

Type of app, domain and consumer engagement strategy

We examined the type of apps using the FDA classification. 17 Most studies related to low-risk medical devices (n = 42, 57%) or wellness (n = 25, 34%). Seven (9%) related to high-risk medical device apps including apps for monitoring vital signs, 23 , 28 and diagnosis of disease including melanoma 29–32 and color blindness. 33

Almost half of the studies related to apps for managing a specific disease (n = 34, 46%) with the remaining focusing on apps for wellness management, self-diagnosis, physical medicine, and medication reminders ( Table 1; Supplementary Appendix C). Most studies involved apps that engaged consumers using 1 or more functionalities such as by displaying and summarizing user-entered information (n = 54), tracking data (n = 52), providing guidance (n = 49), and educational information (n = 49).

Safety concerns

Our examination of the 74 studies revealed 80 safety concerns that were extracted and categorized. The vast majority of safety concerns related to quality of the content presented in apps (n = 67, 84%). Only 13 safety concerns (16%) related to software functionality, which is the match of the app user interface and functions to consumer engagement strategies.

Quality of content

  1. Incorrect information : Reports about wrong information being presented by apps were common. 22 , 24 , 34–47 For example, apps for bipolar disorder (BD) incorrectly 47 differentiated between BD types, and wrongly recommended that patients should “take a shot of hard liquor an hour before bed.” 40 The same group of apps also suggested that BD is contagious (ie, it can be spread to anyone who spends a lot of time with a BD patient). 38 , 40 Another study reported that urolithiasis apps recommended lowering calcium intake, which was incorrect and contradicted the available evidence. 38
  2. Incomplete information : Apps were also reported to provide incomplete information to consumers. 32 , 36 , 46–56 For example, insulin dose calculation apps did not offer a mechanism for reducing amount of postmeal administration that was required because the body may produce its own residual insulin 50 and breast cancer apps were silent about hormonal receptors and the new classification based on the HER (human epidermal receptor). 36 Moreover, apps were also found to miss critical information about health conditions. For example, one study found that apps to diagnose pigmented lesions did not offer clear recommendations about what consumers should do when they receive a provisional negative diagnosis for their lesion. 32 Similarly, many apps that support nutrition allowed users to set dietary goals but did not provide guidance on appropriate goal setting. 55 Another study that assessed the content of the top 5 cardiovascular apps against European guidelines found that only 1 app contained 6 of 8 key topic areas. Surprisingly, none of these apps addressed regular medical follow-up and smoking cessation, which were identified as key topics in the guidelines. 51
  3. Variation in content : Apps that addressed similar domains were found to have significant differences in the quality of their content. 33 , 37 , 47 , 57–61 For example, studies reported inconsistencies in the information presented and tools used in apps for obesity management, 58 physical activity measurement, 59 color vision assessment, 33 and medication self-management. 60 Likewise, apps for pain management required users to enter varying amounts of information to clinically assess symptoms. 57

Box 1. Example safety concerns relating to content quality

Incorrect information: An app for sexually transmitted infections suggested that “Genital warts are bad. If they form in a bunch on your genitals, you will have a very bad time getting them treated and your relationships will shatter.” Another app noted that, “Candida (found in yeast infections) can infect your blood, causing an overload of toxins to disrupt your system, wreaking havoc on your mind and body.” 62 Fetal heart rate monitoring apps provided incorrect statements about normal heart rate, heart rate differences between genders and warnings such as hot foods and rinds of papayas causing miscarriage. 44 Incomplete information: Exercise apps lacked information about indication, frequency, or description of performing the recommended activity. 41 , 46 Another study found that only 4 of 33 (12%) depression apps provided guidance regarding crisis management. 63 Variation in content: Apps that provide guidance about decreased fetal movement had varying information about normal frequency, with some suggesting that 10 kicks felt in 2 hours should be reassuring, while others suggested 10 movements experienced over 12 hours or 1 hour was normal. 47 Incorrect output:Apps to monitor heart rate produced incorrect measurements with absolute differences of over 20 beats/min 64 ; melanoma risk assessment apps underdiagnosed potentially life-threatening melanomas. 30 In another study, Blood Alcohol Concentration (BAC) apps were found to overestimate BAC levels by approximately 3 times. 24 The Instant Blood Pressure app, which estimates blood pressure using the patient’s index finger and positioning along the chest wall, could not detect hypertensive blood pressure ranges 23 Inappropriate response to consumers’ needs: Of the 121 apps targeting high-risk, high-cost populations, that allow patients to record health-oriented information, only 28 (23%) responded appropriately when information was entered that indicated a health danger, such as suicidal mood or ideation. 12 , 65 In another study, 21 of 33 apps for depression did not include content aimed at encouraging professional help seeking when needed. 63

Software functionality

  1. Gaps in features : Many studies found that apps did not adequately support consumer tasks. 52 , 53 , 60 , 70–72 For example, some medication self-management apps did not support oral contraceptives, medications to be taken as needed (PRN), over-the-counter drugs, and variable-dose medications. 60 Others did not allow users to enter dose in grams instead of milliliters or to customize their dosing schedule. 72 Similarly, alcohol cessation apps lacked important features for motivation, identification of risk situations and coping strategies for relapse. 70
  2. Lack of validation for user input : Validation of the data entered by users, which is an important first step, was reported to be missing in many apps. For example, insulin dose calculation apps did not have facilities for simple numeric validation to prevent missing values and text entries in the fields intended for users to enter blood glucose values. 50 These apps were reported to allow calculations despite missing 1 or more values. In other cases, apps did not allow users to change values that had been entered incorrectly. 26
  3. Delayed processing : There were reports about the time taken by apps to process information and generate outputs which could critically affect consumer safety. For example, vital signs monitoring apps, which are considered among high risk mobile medical apps, were reported to measure ECG with a delay of 30-60 seconds. 67
  4. Response to health dangers : Apps were reported to be unresponsive to safety-critical information entered by users. For instance, bipolar disorder apps failed to provide any response when information about severe extremes of mood or suicidal ideation was entered. 40
  5. Faulty alarms : One of the basic consumer engagement functionalities for health apps are reminders, which are usually generated in the form of alarms. 19 Two studies of apps for medication self-management found problems with in-built alarms. 60 , 72

Box 2. Example safety concerns relating to software functionality

Gaps in features: Teledermatology apps did not account for allergies or current medication status, both of which could affect the behavior of the app. 52 Lack of validation for user input Apps for insulin dose calculation lacked standard terminology and simple numeric validation that provided latent conditions, increasing the risk of unintentional data entry slips and mistakes related to misunderstanding. 50 A teledermatology app that generated pick lists for users did not contain required option. In one instance, the user wanted to indicate allergy to sulfa drugs generally but was forced to choose specific ones listed. 52 Delayed processing: Smartphone-based electrocardiography measurement had a delay of 30-60 seconds. 67 Response to health dangers:While apps asked users to input personal health data, very few responded to indications that users were unwell. In fact, only 3 of 35 symptom monitoring apps responded to users indicating severe extremes of mood or suicidal ideation. 40 Faulty alarms: Alarms and notifications in the medication self-management apps were either too loud or not loud enough, did not provide snooze options, or failed to work when phone screens were off. 60 , 72

Consequences of safety concerns

While there were no reports about near miss events, 4 safety concerns had noticeable consequences (8%). A review of apps for type 2 diabetes risk assessment reported that false positives generated by unvalidated risk scores could overwhelm services. 66 Similarly, apps for melanoma risk assessment were reportedly overdiagnosing benign nevi, leading to an unnecessary drain on dermatology resources. 30 Errors in insulin dosage calculator apps 50 and false readings from fetal heart monitoring apps 44 were associated with an increase in the use of unscheduled care by patients. 50

Three safety concerns did not have a noticeable consequence on care delivery (6%). Participants of a study comparing the Ishihara booklet with 2 color vision apps rated both the apps lower on comfort and clarity. 33 Likewise, a another study that claimed to diagnose skin lesions reported that the general population was likely to use the app for all the lesions that they find suspicious, because it was difficult for users to distinguish between benign and cancerous lesions. 31 Another study of cardiovascular apps reported that users could be misinformed due to substandard information quality. 51 While these findings are notable, there were no consequences reported.

Forty (77%) safety concerns were associated with potentially hazardous circumstances. The most frequently reported hazard was the apps’ potential to mislead users by presenting information that was neither evidence based nor endorsed by medical experts. 24 , 25 , 36–38 , 48 , 72–80 Such misleading information or absence of critical information was also reported to potentially worsen users’ health conditions and cause indirect harm. 22 , 24 , 34 , 37 , 41 , 47 , 48 , 50 , 66 , 74 , 77 For example, the study of BAC calculation apps reported that users were provided with information about how much more alcohol they could consume before their driving ability was compromised. Such features may encourage alcohol consumption. 22 , 24 , 37

In one study that evaluated sports coaching apps, researchers found that 23 of 30 apps did not provide instructions about how to choose a workout or how to organize them over a week. The absence of this information could adversely affect users who did not have the appropriate level of preparedness required for each workout. 41 Similarly, misleading information about fetal movements could result in missed opportunities to prevent adverse outcomes such as stillbirth. 47 In addition, another study found that urolithiasis apps recommended consumption of a low-calcium diet, which contradicted evidence and had been shown to be harmful. 38

Another commonly reported hazard was linked with incorrect diagnostic output that could result in false reassurance or unnecessary anxiety. Three studies that evaluated apps for melanoma detection reported incorrect classification of cancerous lesions as “un-concerning.” 29 , 31 , 68 Similarly, apps for diabetes risk assessment reported false negative outputs. 66 If the visit to medical professional was substituted by use of such apps, users may be falsely reassured increasing the risks of harm. 29 , 31 , 32 , 66 Conversely, false positives or higher disease risk scores were reported to increase user anxiety. 44 , 66 , 68 , 77

Gaps in app development

  1. Lack of expert involvement: Many studies found that there was a lack of involvement of subject matter experts in content development. 22 , 29 , 36 , 38 , 42 , 46 , 51 , 55–57 , 73 , 75 , 78–85 For example, only a limited number of apps relating to major vascular diseases, 73 urology, 38 , 81 Alzheimer’s and related dementia, 56 mental health disorders, 22 and chronic pain 57 were developed in consultation with clinical experts or recognized healthcare agencies or organizations. 82
  2. Not evidence based: Many studies found that app content was not based on the available evidence, adequately referenced, updated to reflect current evidence, or offered information that contradicted the available evidence. 22 , 35 , 36 , 38 , 39 , 42 , 43 , 46 , 47 , 49 , 56 , 63 , 66 , 71 , 75–78 , 85–92 For example, a study that evaluated asthma management apps reported that of the 8 apps that presented recommendation about removal of pets from home, only 1 was consistent with evidence. 39 Another study of apps that support bariatric surgery or weight loss surgery patients did not provide references for educational information. 86 Only around 10% of depression apps included evidence-based principles. 22 Moreover, a study of exercise apps reported that these apps were not following evidence-based principles set forth by the American College of Sports Medicine. 49 Additionally, in a study of apps that target cancer patients, only 51 of 166 (30%) apps had been updated in the last 2 years. This study also found that the content of some breast cancer apps was obsolete. 36
  3. Poor validation: Lack of formal validation, which is an important indicator of the safety of diagnostic, screening, and assessment tools used within apps, was commonly reported. 21 , 22 , 35 , 39 , 40 , 50 , 59 , 78 , 80 , 82 , 93 , 94 For example, calculators, questionnaires and assessment tools in asthma management apps had not been formally tested. 39 The same group of apps also contained experimental screening products that had not received regulatory approval. 39 In another study of apps for maintaining a diary of headache, none of the 38 apps reviewed had been subject to formal testing of psychometric properties 94

DISCUSSION

While health apps have the potential to provide easy and low-cost access to care, little is known about the types of safety concerns associated with their use. 95 Previous studies have mainly reviewed a limited number of apps 96 or focused on specific areas of risk such as privacy. 97 Our review is the first to summarize the kinds of clinical safety concerns with consumer-facing apps and their consequences. We identified 10 natural categories relating to the quality of apps themselves and the processes undertaken to develop them. Gaps in processes to design and build apps including the lack of expert involvement, evidence base, and validation were also identified.

We found that health apps pose risks to consumer safety when the content presented within apps is inappropriate or software functionality is compromised. Both these components hold equal significance, as weakness in one can negatively affect the other. The 10 categories of safety concerns that we identified were linked with consequences ranging from actual harm to hazardous events. Based on our findings, we make a number of recommendations for app developers, healthcare professionals, regulators, consumers, and researchers.

Developers should take a user-centered approach, involving subject matter experts and consumers in app development. 98 , 99 We found that app development processes significantly lack the involvement of relevant healthcare professionals or agencies. This finding is consistent with previous reviews. 100 , 101 Experts such as clinicians, technicians, nurses, pharmacists, and therapists possess the right sets of knowledge and skills to lead information design. They have been trained professionally to manage health of people and are usually aware of the current medical guidelines. Moreover, they interact with consumers regularly and are better aware of their concerns. Hence, their absence from the process can lead to poor quality of content. They should be involved at 3 stages 102 : (1) the development phase, when app developers are compiling content for their apps; (2) the internal validation phase, when the information and tools included in apps are validated to confirm if they are correct; and (3) the verification phase, when apps are tested to check if they perform as expected.

Consumers are another group that should be engaged in app development, particularly in usability testing. Similar to the previous studies of self-management apps, 103 , 104 our review indicates that consumers were able to recognize many critical issues with apps, such as incorrect information, inappropriate response to their needs, gaps in features, and faults with alarms. This suggests that involvement of consumers in usability testing will allow problems to be identified and resolved before apps are published. Usability testing allows users to provide important insights about app functionality and medical reliability, 105 helps determine whether the app is convenient for users to perform required tasks, and can reduce costs of fixing errors that may be identified later. 106 Hence, usability testing offers a win-win situation for both app developers and consumers.

In the postdevelopment stage, apps need to be kept up to date to reflect current evidence and should be routinely audited. 95 We found that apps either lacked current evidence or offered information that contradicted the evidence. With frequent updates to the evidence, app developers should carefully plan updates to ensure apps are up to date.

As for healthcare professionals, they should get involved in app development, promotion, and evaluation. Evidence suggests that the majority of providers are open to apps 107 , 108 but hesitate in promoting them, mainly because of the difficulty in identifying apps that are effective, 109 shortage of time, legal issues, and data security and privacy concerns. 110 , 111 To improve the situation, professionals can participate in app development processes and collaborate with other providers to perform app evaluation studies. They can also review scientific literature 112 and have productive discussions with patients about their usability preferences, 110 so as to help patients make informed decision about use of apps.

Regulators need to actively monitor and address safety concerns. In the United States, high-risk health apps undergo regulatory checks by the FDA, Office of Civil Rights, and the Federal Trade Commission for efficacy, information protection, and security breaches, respectively. 65 Other national and regional bodies in Australia, 113 United Kingdom, 114 New Zealand, 115 and Catalonia and Andalusia in Spain 116 are also working to formulate guidelines for app regulation. However, there is a need to develop a monitoring framework that allows consumers to report safety concerns about apps and assists with their management. One such example is the newly announced digital health software precertification program by the FDA through its real-world performance monitoring strategy. 117 Through this program, the FDA plans to collect information about consumers’ experience, software performance, and clinical outcomes and address emerging risks. 117

Consumers need to make more informed choices about apps. At present, consumers encounter hundreds of thousands of health apps when they search app stores. While most consumers prefer using apps that are recommended by their providers or peers, 118 many are still on their own when making the choice. To make an informed decision that is safe and trustworthy, it is important that consumers carefully read descriptions and outcome reports about apps that they are considering as well as search for app developers’ credibility. Relying on the app store rating is not recommended. 105 , 119 Curated libraries of apps established by trusted sources, such as National Health Service UK, can be resourceful tools for consumers to navigate through safe apps. 120 Other proposed strategies such as grading labels may also be useful. 121

The leading app stores encourage users to report any inappropriate content or functionality. Consumers should use these platforms to report any safety issues that they encounter while using apps. Additionally, they can also give their feedback to app developers and regulatory agencies such as the FDA. 122

More primary studies on app safety are required. There is plenty of literature available on health apps. During our search, we screened 2388 abstracts, of these, more than half studied apps. However, only a handful of studies engaged consumers and allowed them to express their concerns relating to the safety of apps. Moreover, to the best of our knowledge, there is no standard method of reporting safety concerns in app testing. There are general frameworks that cover health information technology, such as CONSORT-eHealth (Consolidated Standards of Reporting Trials of Electronic and Mobile Health Applications and Online TeleHealth) 123 and STARE-HI (Statement on reporting of evaluation studies in Health Informatics), 124 but these are limited to specific study types and do not mandate reporting of safety concerns. The reporting of safety concerns and consequences should be mandatory to encourage researchers to evaluate and report safety concerns with apps. Studies are also required to examine the magnitude of the harm from health apps.

LIMITATIONS

This review was restricted to the published literature. The concerns were identified from studies that used a wide variety of designs and may not have captured all possible issues with the different types of apps that are currently available to patients. We did not include the grey literature or any other source of information about apps such as user complaints, and app store reviews. Another limitation is the screening of titles and abstracts was performed by a single reviewer. Studies that noted safety concerns only in the results or discussion sections may have been missed. We also excluded non-English articles, which limits the generalizability of our findings to apps that target non-English speakers.

CONCLUSIONS

Health apps may have significant potential to improve population health. However, to ensure that this potential is met, it is important that apps are safe, effective, and reliable. The gaps in app development, safety concerns, and consequences found in this review call for increased stakeholder engagement, vigilant regulatory frameworks, and more focused research. The reporting of safety concerns and consequences should be mandated in reporting guidelines. These improvements will build trust and increase the confidence of both providers and consumers.

FUNDING

This research is supported by the Australian National Health and Medical Research Council Centre for Research Excellence in Digital Health grant 1134919 (FM and EC). The funding source did not play any role in study design, in the collection, analysis, and interpretation of data, in the writing of the report, or in the decision to submit the article for publication.

AUTHOR CONTRIBUTIONS

FM and EC conceptualized the study. SA and FM led the literature search, data analysis, and drafted the article. SA is responsible for the integrity of the work. She is the guarantor. All authors participated in writing and revising the article. All aspects of the study (including design; collection, analysis and interpretation of data; writing of the report; and decision to publish) were led by the authors.

ACKNOWLEDGMENTS

We thank Jessica Chen for assisting with full text screening of the updated search.