The regulatory status of health apps that employ gamification

0
The regulatory status of health apps that employ gamification

Principal results

Our study has several findings that have implications for regulatory oversight and market compliance of publicly available gamified mHealth apps in the EU. Out of 863 apps analyzed, 69 were gamified mHealth apps. The panel considered 37 apps (53.6%) as MDs or potential MDs, necessitating appropriate clearance. Only 7 (10.1%) of these apps were already considered to have the CE-marking under the regulations as they currently apply, 6 of them in the appropriate risk class. One app was judged to face an up-classification with the transition from MDD to MDR on 31 December 202854. In total, 31 (44.9%) of apps were assessed as not or potentially not compliant with the regulatory requirements, which confirms our first hypothesis (H1).

Additionally, only 9 (13.0%) of the apps provide evidence for their effectiveness; among them, all seven cleared/approved apps4,22,24,27,29,30,45,55,56. There is therefore a large gap in evidence for the largest segment of gamified apps and serious games. It is concerning that there is lack of clinical evidence (or the non-publication of this by the manufacturers) and a lack of appropriate regulatory approval, as lay people with no medical training could download these consumer-facing apps.

Only 14.5% (n = 10) of the assessed apps would fall into a low-risk category (Class I under MDR), whereas 39.1% (n = 27) could be classified as Class IIa or higher. MDs that fall into Class I, based on the MDR rules, have an easier pathway to regulatory approval since they can self-declare their conformity with the MDR without the involvement of a notified body31,57. In comparison, higher-risk apps must be developed within a sophisticated and certified quality management system, bringing many quality requirements and increased regulatory burden to market access and involving a notified body for the conformity assessment31,53. This process is presented in detail in Supplemental Material 1. It is highly unlikely that any of the 23 (33.3%) apps that are on the market without a CE-mark but assessed as Class IIa or higher will have undergone any standardized process of design, testing, risk assessment, and post-market surveillance, which are judged necessary by legislatures for apps in these categories.

The reliability of agreement among reviewers on this binary decision (MD or not MD) was fair (Fleiss kappa = 0.40), while the reliability of agreement for the risk classification itself (non-MD to Class III) was noticeably lower but still fair (Fleiss’ kappa = 0.33). This could be due to different reasons: firstly, a large variation of the reviewer’s perspectives indicating the challenge, even for experts, to classify gamified apps; secondly, a regulatory framework that is ambiguous; and thirdly, a regulatory framework challenged by the continuous emergence of new software approaches and app types that were not anticipated when the regulations were written. We suspect that the reason for this lies in a combination of all three explanations, considering existing evidence that the practical classification of MDs is ambiguous and discussed by courts58 and among experts59. Additionally, the ruleset for SaMD remains unclear and relatively broad, as stated in MDCG 2019–1160. Some experts have even challenged whether any mHealth apps can be classified as MDR Class I devices, and some competent authorities have taken this view, refusing to accept the registration of any Class I apps61, while other experts and competent authorities have accepted the registration of Class I apps. This ambiguity is highlighted by the classification assessments of individual apps in this study, often ranging between Class I and IIa (Fig. 4).

Specific risks linked to gamification

Our assessment shows that gamification was used in both approved MDs and in apps with a clear medical purpose that are available through the app stores despite lacking the required clearances/approvals. The second research question centered on whether the observed lack of compliance means risks for patients that might be related to gamification. The presence or absence of gamified elements does not solely determine an app’s qualification as a medical device since, here, the app’s intended purpose, which is defined by the developer, is crucial. However, gamification elements are of great importance for the risk assessment process, an integral part of the MD approval process, particularly when they are integral to the app’s medical purpose. Thus, they may affect the MD’s risk classification. While it is likely that apps that are approved/released as MD have considered the risks associated with gamification in their design, risk assessment, usability engineering, and mitigation measures as required by the regulations, it is highly unlikely that any of these formalized approaches to safe design have been followed for apps that are on the market without the required clearance/approval.

The in-depth analysis of a subset of eight apps was focused on specific risks associated with gamification elements and harms that could emerge from the failure of such elements. In a standardized approach defined in ISO 1497150, we have described risks related to hypothetical scenarios as well as risks that occurred while testing the app. This approach would be a requirement for the approval of MDs.

Our results show that for most of the apps examined, the scenarios of failure of gamification elements posed little risk to patients. This is due to the following: (i) the main risk to users was being misled by incorrect medical information, resulting in mild to moderate harm; and (ii) most of the apps studied had a low level of gamification, and we assessed that there was a low association between gamification and risk in the app. Some of these risks could be caused by software bugs, which could be addressed through updates, while others may stem from foundational design flaws requiring more substantive revisions. For apps with a higher degree of gamification, we found a stronger link between risks and gamification elements.

Of the eight apps included in the in-depth analysis, three had a strong connection between gamification and risk and possessed potentially hazardous or uncontrolled aspects to their gamification that could lead to patient harm. The first app (23_CARD) provided an interactive chat tool powered by ChatGPT-4 (a large language model-based chatbot62), branded as an “AI doctor.” This tool claims to offer health advice based on deep learning algorithms, presenting a user-friendly interface adorned with comic-style 3D elements and avatars. Its gamification strategy hinges on a credit-based reward system designed to encourage daily logins and advertisement viewing, using gift timers and push notifications to foster habit-forming user behaviors. The number of interactions with the chatbots is limited by an in-app currency, which could lead to incomplete health assistance. The user would then rely on partial information, potentially resulting in delayed or incorrect medical treatments. Moreover, the app risks disseminating incorrect medical information, which occurred during the testing, misleading users about the severity or treatment of their conditions by providing contradictory advice under the guise of a specialized “AI doctor.” This misinformation could inadvertently direct users to make harmful medical decisions based on incomplete or inaccurate data. Additionally, since some of the app’s features are non-gamified while others are, the app poses the risk that patients neglect these non-gamified features (e.g., blood pressure management), leading to incorrect or delayed treatment.

The second app (43_DIAB) for use by diabetics incentivized users to enter their blood glucose measurement data by awarding them with ‘points.’ It used this data to calculate self-application dosage amounts of insulin. The app in question uses only five out of 81 items in the questionnaire for insulin dosage calculation and also rewards entering ‘0’ as a value (e.g., if no insulin was taken). If those constraints are implemented in a poor way (e.g., no option for adding ‘0’ as a value and still getting rewarded), or no consistency check of the input is performed, the user could ‘cheat’ to gain more points, e.g., by adding impossible hours of training or carbon intake, wrong insulin injection dosages. Those wrong values could potentially lead to false analyses and health recommendations by the app and hence might lead to lower clinical outcomes or, in a worst-case scenario, to high-risk situations for the patient, e.g., in hypoglycemia, if too much insulin is injected. An additional risk identified is that some features of the app are less gamified than others. This could lead to a neglect of the less gamified features, leading to incorrect or delayed treatment. We acknowledge that we have not had the opportunity to review the material provided for the regulatory process for this specific app. These reports would include both usability and clinical testing and may demonstrate that the approach taken in the app is indeed safe.

The third app (67_NEUR) was rated as having a high ‘gamefeel’, although it only has one gamification element as defined by Sailer et al. (2017). Even if it cannot be described as a fully immersive, full-fledged game due to its limited visual presentation and the absence of relevant game aspects, it still represents an edge case. The gamified character in this app was much more strongly linked to the intended purpose than in the other applications analyzed, leading to a limited immersive experience for the users. As a result, the connection between the gamification elements and potential hazards was higher than in apps that included gamification elements but no immersive experience. One gamification-related risk identified in this and several other apps was that of over-reliance on the apps due to their engaging design and the stimulation effect of gamification. In scenarios of app malfunction, gamification of this nature could exacerbate delays to the start of effective treatment, as the gamification could promote ongoing use. The stronger the gamification was linked to the intended purpose of the app, the stronger the connection between risk and gamification was among the apps analyzed. Whilst no fully immersive SGs could be included in our study, 67_NEUR could point to interesting future research. Since the intended medical purpose of immersive SGs is particularly strongly related to gameplay, the connection between the degree of gamification and risk observed in our study could be investigated better.

Considering the complex and varied link between gamification and risks, Hypothesis 2 could neither be definitively accepted nor rejected. However, a small number of dangerous gamified apps were identified on the market that are not approved/cleared and where there is a clear need for action by the regulatory authorities and the app store in question (Google PlayStore).

Implications for the app stores

Since Google and Apple have an effective duopoly on the mobile market63, they play a pivotal role in distributing consumer-facing health apps. As importers or distributors under MDR31, the app stores are legally obliged to ensure regulatory compliance of the apps they offer38. Our results on gamified apps support previous findings on other types of mHealth apps that neither Apple nor Google adequately meet those requirements34,35. This study shows that the availability of unapproved apps, despite the legal requirement for MD approval, is greater on the Google PlayStore (42.9%) than on Apple’s AppStore (35.4%). This observation is even more pronounced for gamified apps exclusive to the Google PlayStore, where 66.7% are unapproved despite the assessment in this study as requiring l MD approval.

Additionally, the naming convention of the categories in both app stores remains unclear. Most apps considered in this study as MDs are in the ‘Medical’ category (n = 27), but some are also found in the ‘Health&Fitness’ category (n = 10), and in this latter category, none of which are adequately cleared/approved. While it was expected that a lot of MDs would be found in the ‘Medical’ category, the high number of MDs in the ‘Health&Fitness’ category was surprising since fitness apps are generally not considered MDs31,33,60,64. A clear categorization could make distinguishing between MD and non-MD apps easier for users and HCPs.

Furthermore, apps communicate their MD status and associated risks in various ways. Some explicitly mention their regulatory classification and potential risks in their descriptions or supplementary documentation, such as websites or publications, and some do not at all. This inconsistency makes it difficult for users and HCPs to assess an app’s regulatory status and associated risk.

A balanced approach is necessary for app stores to address those challenges. Although regulatory bodies have the responsibility of approving new MDs, their limited resources make it difficult to provide complete oversight of all newly published applications. Here, the role of the app stores as market entry points becomes critical. Since they already have expertise in the automated screening and analysis of applications, they could assure regulatory approval or certification for apps with a medical purpose and implement basic checks to ensure content accuracy34. The aim of that is to make it impossible to download clearly unsafe and illegal products from the app stores.

Assessment of the US approval status

Comparing the EU market with the US, we found a higher proportion of MD apps in the US. Nonetheless, due to the FDA’s enforcement discretion rules, fewer apps were marketed without the required approval in the US compared to the EU. This goes hand-in-hand with our finding that most assessed apps would fall into Class I and, therefore, be under enforcement discretion. The observation of a more flexible US regulatory classification approach for low-risk apps is consistent with the findings of other studies53. We observed that the FDA product classification database65 provided greater transparency around regulatory decisions and evidence than the EU counterpart database, which is only partially operational and will provide less transparency on completion66.

This poses two main implications for the EU. Firstly, the US system enables low-risk Class I digital apps to reach the market more rapidly, potentially spurring faster innovation53. Secondly, the easy access to information on regulatory decisions and evidence from the FDA databases provides openness on app classification decisions and associated evidence, which is lacking in the EU and is not planned for in the EU regulatory framework31. The greater US openness allows all stakeholders, including clinicians and the public, to examine the basis for regulatory approvals.

Implications for the regulation of gamified MDs

Our analysis shows that many gamified mHealth apps would be considered as MDs under current regulations. As such, gamified MDs, like non-gamified MDs, must comply with the applicable laws regulating them. While the general compliance of mHealth apps (gamified and non-gamified) was questioned by other researchers35, the compliance issues might be caused by different reasons, including unawareness or insufficient guidance documents, although existing regulations provide guidance on certain aspects of SaMD, including audio-visual design46, AI44 and cybersecurity48,67. For gamified apps, this lack of compliance could be caused by a notable gap of guidance documents addressing the intricate relationship between gamification elements, engagement design, and their impacts on health outcomes. The unique characteristics of such apps should be considered in the current regulatory approval process along with other aspects of the app’s safety and performance within the assessment, validation, and regulatory frameworks. A tailored framework, guidance, or standard should delineate the specific risk categories associated with gamification and outline the required steps for design and human factors evaluation, as well as for clinical evaluation, considering the prolonged and dynamic user activity characteristics of gamified apps. Such a framework should complement existing regulatory legislation, but not replace them. They are intended to provide guidance on how these products are manufactured and how their risks can be assessed within the framework of the applicable laws. Similar guidance documents exist for other risk aspects of MDs, e.g. cybersecurity48,67 or AI47. These horizontal features are present in many MDs and create additional risks, while the corresponding guidance documents help to assess and mitigate them. Our paper is a step towards how this can also be done for gamification. The increasing number of gamified health apps and the ambiguities around current regulations presented in this study underline the necessity for specific guidelines.

Additionally, there is a need for a more transparent and accessible system to build trust in the EU medical device landscape, as has been recognized for implantable medical devices in the past68. This particularly applies to gamified mHealth apps, of which, in the judgment of the expert panel in this study, a surprisingly high proportion are on the market without the required approvals.

Limitations

This analysis has several limitations. The inclusion criteria limit assessed apps to the most populous EU countries, and this potentially limits the generalizability of results to all EU member states. Additionally, due to feasibility reasons, we had to limit our search of the app stores to the categories ‘Medical’ and ‘Health&Fitness’. Thus, we might have missed a small number of apps that have been misclassified by the app stores or developers. Assessment panel members had free choice of whether to download, install, and explore the functionality of the mHealth app or to assess the app based on the developer-provided descriptions, images, and videos (on the app stores, developer websites, and research publications). A minority of apps were downloaded. This is, however, compatible with the assessment of apps based on the developer-stated claims as they relate to the developer-stated functionality, which is also the core approach of the regulatory approval process. Only mobile apps were included in this study.

The definition of gamification and of gamification elements is still an area of discussion among researchers; an agreed consensus definition does not yet exist, and multiple researchers propose different lists of gamification elements4,22,69,70. We used the definition proposed by Sailer et al. (2017)4, which is widely recognized. However, competing definitions also exist9,11,13, which could limit our findings.

The identification of evidence on existing apps could be limited as developers of apps, whether MD or non-MD, are not obliged to put evidence in the public domain. As the EU EUDAMED database is still under development, there is no single source to definitively search a list of approved MDs and their risk class. A detailed search of the app store descriptions, the developer’s websites, research publications, and a general internet search was conducted. The qualification of apps as MDs in the EU is largely based on the manufacturer’s reported intended purpose. It was not always possible for the assessment panel to base their decisions on a stated intended purpose since not all apps clearly stated this. In these cases, the panel’s judgment was based on all available information on claims and functionality.

The analysis of specific risks of gamification was only conducted for a subset of eight apps, which had to be available in German and English, free of charge, available in the Google PlayStore, and accessible without a doctor’s prescription. Additionally, the degree of gamification was judged on a self-developed arbitrary scale. Thus, findings about the intersection of risks and gamification could be limited by the arbitrary character of this scale.

link

Leave a Reply

Your email address will not be published. Required fields are marked *