A hierarchical framework to evaluate the usability of smartphone health applications
The methodology used in this research consists of four phases: in the first phase, an extensive literature review was conducted to explore and identify the relevant factors that impact the usability of health apps. As a result of this extensive literature review, the relevant usability parameters were identified. In the second phase, a survey was conducted to determine the importance of various identified usability parameters, and only those with a significant impact were further considered in this study. In the third phase, the relative importance (weight) of each key usability parameter was estimated using the most commonly used pairwise comparison method—that is, the analytic hierarchy process (AHP). AHP is one of the MCDM decision-making techniques that is unquestionably effective at reaching judgments through pairwise comparisons of qualitative and quantitative elements. A good method for determining the weight of the parameters used in experts’ reasoning processes is presented by AHP14. The output of third phase is a ranked list of usability parameters for health apps, which were the basis of the proposed usability evaluation model. Finally, in the fourth phase, the proposed model was compared with existing models. After a thorough review of the study objectives and methodology, ethical approval for this study was granted by institutional review board of Department of IT, University of Gujrat. Figure 1 graphically illustrates the research methodology. Below we discuss the details of the four phases of research methodology.

Phase 1: systematic literature review to extract usability parameter
To make sure the transparency and reproducibility in determining the pertinent literature, PRISMA 2020 (Preferred Reporting Items for Systematic Literature Review and Meta Analyses) guidelines were used39. The purpose of the systematic literature review was to recognize the recurring usability parameters that have been emphasized in recent studies. The search was made in six major scholarly repositories including IEEE Xplore, ScienceDirect, PubMed, SpringerLink, Google Scholar, and Wiley Online Library to cover literature published in the years 2010 to 2024 as this period marks the rapid growth of smartphone-based health applications and related usability research. The search strings and Boolean combinations used to identify the relevant literature are as follows: (“usability” AND (“health apps” OR “mobile health” OR “mHealth applications”) AND (“evaluation” OR “framework” OR “model”)). The preliminary search returned 87 studies. After scanning titles and abstracts of the papers and removing duplicates, 70 records were reserved for full paper evaluation. The inclusion criteria consider studies (i) concentrate on research related to usability of smartphone or mobile health applications (ii) discuss usability parameters of smartphone apps, evaluation frameworks or models (iii) publications in English language. The exclusion criteria comprised (a) studies which are not relevant to smartphone or mobile phone health apps (b) publication lacking explicit usability criteria, and (c) editorials, grey literature and non-peer reviewed papers. Considering such criteria, 54 studies were finalized for detailed qualitative synthesis whereas 33 studies were discarded as they did not fulfill the inclusion criteria. The overall summary of the selection process is shown in the Fig. 2. This systematic approach ensures that the process is transparent, reproducible and consistent with the standards to develop comprehensive development of framework.

Phase 2: finding the significance of usability parameters
In phase 2, the significance of usability parameters was determined, and only those exceeding a predetermined threshold value were taken for further analysis. To identify the key usability parameters, this study utilized a survey method, as surveys are an effective means of collecting data from a broad population. The participants were requested to provide their opinion and mention the level of importance of a specific usability parameter based on 5-point Likert scales, with the choices ranging from ‘very important’ to ‘not important’. Once participants had completed the survey, instruments were obtained for further analysis. To perform the qualitative analysis of the gathered data, the IBM SPSS statistics tool version 21.0 was used. The numeric values from 1 to 5 were assigned to each choice of the Likert scale, where 5 was assigned to ‘very important’ and 1 to ‘not important’. The targeted population of this survey was healthcare professionals (patients, doctors, nurses, paramedical staff, etc.), who used health apps and filled out the questionnaire. The non-probability snowball sampling method was used to reach health-app users and professionals with relevant experience. The technique utilized the professional networks and academic circles to invite participants, ensuring coverage across multiple health related groups and general users40.
The target population for the present research consisted of healthcare stakeholders who either use or have used smartphone health applications. Their practical experience and domain knowledge were considered vital for the identification and prioritization of usability parameters. Therefore, the following inclusion and exclusion criteria were applied: (i) Those people included who were employed in healthcare sector or health-related education (e.g., doctors, pharmacists, paramedics, and medical students). (ii) Those users from general population were involved who had at least some prior experience with smartphone health applications either for professional or personal health management purposes. and (iii) voluntary consent to the participation in the study. The exclusion criteria include (a) respondents who had no experience in using health apps. (b) incomplete or inconsistent answers. The justification for choosing these groups of participants was to ensure the representation of multiple stakeholders including the perspectives from both the clinical and non-clinical sides, which is crucial to develop a usability framework that reflects the wide-ranging expectations of end users in the context of smartphone health apps. The selection of health applications was open to all kinds of apps; instead, participants were instructed to reply on the basis of their experience with any smartphone health apps (e.g., fitness tracking, telemedicine, medication management) and so forth. This open-ended method enabled capturing usability perceptions across a broader spectrum of commonly used apps, thus making the findings more broadly applicable. The past experience of the participants with smartphone health apps was measured directly by a background section in the questionnaire that had items on the duration of app use (less than 1 year, 1–3 years, 3–5 years, more than 5 years). During the AHP phase, only those participants who had at least three years of app-usage experience were included for making informed and consistent pairwise comparisons.
A total of 195 questionnaires were distributed to participants who belong to different stakeholder groups. Among them, the six incomplete and vague survey questionnaires were excluded. The rest of the 189 were utilized to analyze the collected data. After getting the data from healthcare professionals and general users, the SPSS software was utilized for further analysis. The questionnaire was used mainly for two purposes: firstly, to identify and filter out the less significant usability sub-parameters derived from literature by applying a threshold mean value; and secondly, to empirically validate the conceptual framework by translating expert judgments into quantifiable inputs for the AHP weighting process.
The demographic profile of the respondents is as follows: The respondents in terms of usage experience were 120(61.54%) respondents with < 1 year of experience, 26(13.33%) respondents with < 3 years of experience, 35(17.95%) respondents with 3 years of experience, 12(6.15%) respondents with 3–5 years of experience, and 02(1.03%) respondents with > 5 years of experience of using health apps. In terms of qualification, 108(57.14%) respondents were MBBS, 21(11.11%) were pharmacists, 24(12.69%) were nursing diploma holders, 11(5.2%) were midwifery diploma holders, 01(0.52%) were undergraduate, 03(1.58%) were graduate, 17(8.99%) were postgraduate, 14(2.11%) were with M.Phil. qualifications. Amongst them, 108 (57.14%) were doctors, 21 (11.11%) were pharmacists, 35(18.51%) were paramedical staff and 25(13.22%) were related to other academics, as shown in below Table 1.
The priori power analysis was conducted using standard parameters for behavioral and usability research such as (significance level α = 0.05, statistical power = 0.80, and medium effect size as recommended by Cohen). According to the analysis, at least 150 participants were needed in order to identify significant effects with the given power. With 189 valid responses, the final dataset surpasses this criterion and guarantees enough statistical power and representativeness for the usability parameters of smartphone health apps.
Phase 3.3: prioritization of usability parameters
The main objective of this phase is to find out the relative weights and rankings of the usability parameters established in Phase 2. The usability dimension weights have been estimated using the pairwise comparison approach. Pairwise comparison refers to any method of comparing two entities to determine which is preferred, which possesses a higher quantity of a particular quantitative attribute, or whether the two entities are similar. The origin of the approach can be traced back to the renowned multi-criteria decision-making framework known as AHP which is employed in various areas of research14,41. This study employed the pairwise comparison method to determine the relative importance of the key usability parameters which are as follows:
Fill out the matrix for pairs of comparisons
In this method, the relative importance of two parameters is examined through a scale containing values ranging from 1 to 9. The pair of parameters is assigned a value of 1 if parameter Pi is certainly as significant as parameter Pj. The value 9 is assigned if one parameter, Pi, is much more important than the second parameter, Pj. Intermediate values are used for varying degrees of importance as shown in Table 2. For example, where Pi is less significant than Pj, fractional values from 1/1 to 1/9 are utilized. The fractions 1/1 to 1/9 are provided for ‘less important’ relationships; 1/9 specifies that Pi is significantly less important than Pj.
A questionnaire was developed and distributed to participants with at least three years of experience using health apps, to assess the index values based on expert opinions. Experts were asked to evaluate the significance of each usability parameter in relation to other usability parameters, using scale shown in Table 2, and document their evaluations. If there is a variance in the estimates of experts, a consensus technique can be applied to minimize the divergence. A cross-matrix C (n x n) is populated row by row with the estimates approved finally. Equation (1) first populates the diagonal of C with values of 1. Second, until every parameter has been compared to every other parameter, the right upper half of C is filled. If Pi to Pj was evaluated with the relative significance of m (\(\:i.e.,\:{C}_{ij}=m)\), Pj to Pi must be rated with 1/m (\(\:i.e.,\:{C}_{ji}=1/m)\). Finally, through Eq. (2), the corresponding fractions are filled in the lower left side of C. (Note that the parameters of C in row i and column j are denoted by \(\:{C}_{ij}\), and that i and j are positive integers ≤ n).
$$\:{C}_{ij}=1,\:i=j$$
(1)
$$\:{C}_{ij}=\frac{1}{{C}_{ji}\:},\:i\ne\:j$$
(2)
Determine the comparison matrix that is normalized
By dividing each parameter in matrix C by the total of the parameters in its column, a normalized comparison matrix \(\:{C}^{{\prime\:}}\) is produced. Equation (3) shows this.
$$\:{\:C}_{ij}^{{\prime\:}}=\:{C}_{ij}/\sum\:_{i=1}^{n}{C}_{ij}$$
(3)
Determine the factors relative weights
Equation (4) shows how to calculate the mean of each row in \(\:{C}^{{\prime\:}}\) to obtain the weight \(\:{w}_{i}\) of each parameter Fi.
$$\:{w}_{i}=\:\frac{1}{n}\sum\limits_{j=1}^{n}{C}_{ij}^{{\prime\:}}$$
(4)
Equation (5) demonstrates that these weights are already normalized, with a sum of 1.
$$\:0\le\:{w}_{i}\le\:1$$
$$\:\sum\limits_{i=1}^{n}{w}_{i}=1$$
(5)
Verify the consistency of the pairwise comparison results
Saaty states that a consistency ratio of less than 10% is acceptable; if not, pairwise comparisons need to be adjusted (Lane & Verdini, 1989). Equation (6) provides the ratio of consistency (CR).
$$\:CR=\:\frac{CI}{RI}$$
(6)
Where CI is consistency index that is shown by Eq. (7)
$$\:CI=\frac{{\lambda\:}_{max}-n}{n-1}\:\:$$
(7)
The rank of pairwise comparison matrix is denoted by n, and λmax is maximum eigenvalue.
The random index (RI) of consistency varies in value based on the number of parameters, as shown in Table 3.
Several software tools exist to implement the AHP approach. Some important tools are AUTOMAN, Criterium, HIPRE3, and Expert Choice. The Expert Choice is known as standard AHP software. It is therefore Expert Choice is used in this research to implement AHP41. AHP is implemented through the following steps.
-
(1)
Specify research goal
The research goal of this study is to evaluate and rank the important usability parameters of health apps.
-
(2)
Arrange goal and evaluation parameters in hierarchical format
The initial level of hierarchical structure, level 1 of the hierarchy defines the research goal. The second level defines the major usability parameters and the third level outlines the sub-parameters corresponding to each usability parameter.
-
(3)
Calculate the relative weights
Calculate the relative weights for parameters and sub-parameters through above discussed steps 1 to 4. A pairwise comparison involves n (n − 1)/2 comparisons where ‘n’ denotes number of parameters or sub-parameters41.
Research outcome: proposed usability framework
In this stage, usability parameters were evaluated by end users and domain experts such as doctors, pharmacists, paramedics, medical students and regular smartphone health application users. This combination confirmed that the data reflected both practical exposure of users and professional judgements. Every construct of usability including efficiency, effectiveness, satisfaction and comprehensibility in turn was operationalized with certain directly measurable sub- parameters derived from existing usability frameworks. Participants used a five-point Likert scale to rate how important each sub-parameter was, (from 1 = not important at all, to 5 = extremely important). After collecting the scores, the average for each one was calculated to determine their importance, and only sub-parameters with above average importance (i.e. determined using threshold value) were retained for further analysis. These filtered and validated sub-parameters were further employed in AHP pairwise comparison process to ascertain relative weights and rank the parameter and sub-parameters. This multi-step process guaranteed that the constructs operationalization was both empirically grounded and methodologically aligned, boosted framework’s reliability and replicability.
Phase 4: comparison with existing models
The comparison of proposed model with previously introduced models has been made in terms of usability parameters recognized by each model for the evaluation of smartphone health apps. The comparison is presented in subsequent Table 9. The table includes two columns; 1st column was consisting of different models with the proposed model at the top and 2nd column was consist of usability parameters identified in each model. The comparison was made by highlighting the presence and absence of usability parameters in a particular model.
Phase 5: empirical validation of the proposed framework
A pilot validation was performed on two commonly used smartphone health apps to develop the practical relevance and preliminary empirical validity of the proposed usability evaluation framework. To ensure diversity in user interface design and functionality the chosen applications represent different types of smartphone health apps such as Welltory (i.e. diagnostic and monitoring app) and Oladoc (i.e. a facilitation app). Three experience evaluators participated in the validation process. Every evaluator had prior usability evaluation experience in mobile apps. Additionally, a brief orientation session was also given regarding the structure and usage of the proposed framework. The finalized usability parameters of the framework weighted using AHP technique were converted into a structure evaluation sheet using a 5-point Likert scale (1 = extremely poor usability, 5 = excellent usability). Every evaluator independently interacted with these two apps for at least 20 min, exploring the user interface design, navigational structure, feedback mechanism, information presentation and overall user experience according to the proposed framework. The weighted usability score (WUS) for each app was calculated by multiplying individual evaluator ratings with corresponding AHP weight. The WUS Eq. 8 is as follows:
$$\:WUS=\sum\limits_{i=1}^{27}(Ratin{g}_{i}\times\:Weigh{t}_{i})$$
(8)
To evaluate whether the framework allows agreeing scoring among evaluators, inter-rate reliability (IRR) was calculated using the interclass correlation coefficient (ICC) that is a commonly known reliability indicator for multi-rater usability research.
link
