Introduction
Chronic hepatitis B (CHB) is associated with serious complications, including liver failure, cirrhosis, and hepatocellular carcinoma, and is estimated to contribute to 650,000 deaths each year [1]. Hepatitis B e antigen (HBeAg)-negative CHB represents a late phase of infection characterized by persistent viral replication, progressive liver damage and cirrhosis [2,3]. The prevalence of HBeAg-negative CHB is increasing worldwide because of aging of populations [4,5]. Current guidelines recommend the use of nucleot(s)ide analogs (NAs) or conventional or pegylated interferon (PegIFN) alfa as first-line therapy for HBeAg-negative CHB [5,6]. NAs are highly effective at suppressing hepatitis B virus (HBV) DNA replication and are well tolerated; however, relapse occurs frequently after withdrawal of therapy [5,6]. Hepatitis B surface antigen (HBsAg) levels decline very slowly during NA therapy [7]. Consequently, NA therapy must be continued for life to remain effective in the majority of patients [5], placing a burden both on the patients, who must adhere to the daily dosing regimen, and on healthcare systems, which must sustain potentially lifelong treatment programs.
In contrast to NAs, PegIFN alfa-2a induces virological responses that are durable after completion of treatment in approximately one-third of HBeAg-negative patients [8,9] and are sustained during long-term follow up [10,11]. The decline in HBsAg is more rapid during PegIFN than NA therapy [12], and PegIFN appears to be the driver of HBsAg decline and clearance when used in combination with tenofovir [13].
However, treatment is often associated with side effects that may lead to discontinuation of therapy in some patients. The ability to identify patients most likely to respond to PegIFN alfa would be clinically useful, allowing clinicians to select patients most likely to respond to a finite course of such treatment, as well as those most likely to require long-term treatment with NAs.
The use of on-treatment factors, such as the decline in HBV DNA or HBsAg, to identify patients likely or unlikely to respond to PegIFN alfa is well established in clinical practice [14]. Monitoring HBsAg and HBV DNA during PegIFN alfa-2a therapy can identify patients likely to achieve a post-treatment response [9,15-19]. Although the ability to identify potential responders during treatment is useful, it would be better to identify such patients before initiating treatment. Current guidelines recommend consideration of baseline factors in treatment decisions regarding HBeAg-positive patients only [5]. Limited data suggest that baseline factors can identify HBeAg-negative patients likely to respond to PegIFN alfa-2a [8,10,20,21]; however, the use of baseline factors to predict response is not currently part of routine clinical practice, because individual factors have relatively low negative and positive predictive values for treatment response.
The objective of this analysis was to develop a baseline scoring system to estimate, prior to treatment, the likelihood of an HBeAg-negative patient achieving a durable off-treatment response after receiving PegIFN alfa-2a therapy.
Materials and methods
We conducted a pooled retrospective analysis of data from three studies of PegIFN alfa-2a (PEGASYS®, Roche, Basel, Switzerland) in HBeAg-negative patients: a large phase III randomized study (WV16241) [8]; a phase IV randomized study (PegBeLiver, ML18253, NCT01095835) [9]; and a nonrandomized study (PERSEAS, ML22016, NCT01283074) [22]. Data from 323 HBeAg-negative patients were included (192 from WV16241 [8], 48 from PegBeLiver [9] and 83 from PERSEAS [22]). All patients with HBV genotype B/C infection were derived from WV16241 [8], while patients with genotype D infection were derived from all three studies [8,9,22]. Differences in response rates in the three studies were not statistically significant (P>0.05, Chi Square tests).
Selection criteria
Inclusion and exclusion criteria for these studies have been described previously [8,9,22]. Patients infected with HBV genotype B, C, or D, assigned to 48 weeks’ treatment with PegIFN alfa-2a 180 µg/week (without/with lamivudine) and who had baseline quantitative HBsAg, HBV DNA and alanine aminotransferase (ALT) data, were included. Patients from WV16241 were included only if they had participated in a long-term follow-up study (WV16866).
Outcome definitions
Two outcomes were used in the analyses, both determined at 48 weeks post-treatment: HBV DNA <2000 IU/mL (virological response), and the combination of HBV DNA <2000 IU/mL plus normal ALT (combined response). Patients with missing HBV DNA or ALT values at 48 weeks post-treatment were considered non-responders.
Development of baseline prediction score
HBV genotype influences the response to IFN [5]; hence, separate scores were developed for patients infected with genotype B or C (B/C) and those with genotype D. The number of patients with genotype A infection was insufficient to develop a scoring system.
The baseline prediction scoring systems were developed following several steps. First, generalized additive models (GAMs) with the logit link were used to identify appropriate cutoffs via visual inspection for continuous predictors. This included a search for the most statistically significant cutoff for each continuous baseline factor using logistic regression analysis, which could be used to stratify patients into two subgroups corresponding to high and low response rates (virological response and combined response data were considered when defining cutoffs). Baseline factors included in these analyses were age, HBV DNA, HBsAg, ALT level and sex. Next, separate multiple logistic regression (MLR) models were developed for patients infected with genotypes B/C and D, using a backward elimination process that considered only those factors (and cutoffs) associated with a response with a P-value of <0.2 in the univariate analysis. In the backward selection procedure, a P-value of 0.15 was used to select factors that would remain in the models, which allowed for the inclusion of factors with moderate predictive value. Internal validation methods were applied to assess the stability of the selected model, using bootstrap resampling methods [23]. A variable was considered to be a reliable predictor if it was selected in at least 50% of 500 bootstrapped samples with replacement. The discrimination of the model was quantified by calculating the area under the receiver operating characteristics curve (ROC-AUC: target ≥0.7); corresponding 95% confidence interval (CI) and the optimism-corrected ROC-AUCs were determined using the bootstrap samples. The Hosmer-Lemeshow test was applied to assess the goodness of fit of the final model (i.e., no P-value <0.05).
Factors that remained in the model of only one endpoint were added to the model for the other, to ensure that the models for both endpoints contained the same set of factors. To devise the scoring system for patients infected with HBV genotype B/C or D, predictive baseline characteristics retained in the MLR models were assigned points, taking into account the magnitude of the regression coefficients estimated in the models [24]: a value of 0 points was assigned to the reference category for each predictive factor; next, points were assigned according to the size of the regression coefficient, using 1 as the unit for 1 point with rounding to integer values.
Points assigned for individual characteristics were summed to arrive at a total score for each patient, with higher scores indicating a higher chance of achieving a response. Response rates by prediction scores were determined, including 95% CIs. Prediction characteristics (sensitivity, specificity, positive predictive value, negative predictive value [NPV]) and the positive likelihood ratio were determined for the two endpoints for both prediction scores.
An analysis of patients with genotype B/C infection enrolled in China, Hong Kong and Taiwan is provided in the Supplementary Data.
Applicability of stopping rules
The applicability of the stopping rule recommended by the European Association for the Study of the Liver (no decline in HBsAg and <2-log10 decline in HBV DNA at treatment Week 12) was evaluated after stratifying patients by baseline score.
Results
Patients
A total of 778 patients were assigned to treatment, of whom 323 were included in the analysis of baseline characteristics. Reasons for exclusion from the analysis are shown in Supplementary Fig. 1. The study population comprised 157 patients with HBV genotype B/C infection, all of Asian ethnicity, and 166 patients with HBV genotype D infection, all of Caucasian ethnicity (Table 1). Baseline characteristics of patients from each study are shown in Supplementary Table 1. Among genotype B/C patients, the mean age was 38.9 years and 83.4% were male. Among genotype D patients, the mean age was 43.7 years, and 70.5% were male. The prevalence of METAVIR Stage 4 fibrosis was 7.6% among patients with genotype B/C infection who had a pretreatment biopsy result and 2.6% among patients with genotype D infection.
Supplementary Figure 1 Reasons for exclusion from the analysis by study
HBV, hepatitis B virus.
Table 1 Baseline characteristics
Supplementary Table 1 Baseline characteristics of patients in the analysis according to original study
Development of baseline prediction score
Graphic analysis and univariate logistic regression analysis to identify baseline predictive factors and appropriate cutoffs
The GAM plots show the relationships between continuous variables (age, ALT ratio, HBsAg level, and HBV DNA level) and treatment response in patients infected with HBV genotype B/C (Supplementary Fig. 2A-H) and D (Supplementary Fig. 3A-H). Based on these analyses and the search for optimal cutoffs using univariate logistic regression (ULR) analysis, the following cutoffs were considered for further analysis in patients infected with HBV genotype B/C: age ≤30, >30-45 and >45 years; ALT ratio <5 and ≥5 × upper limit of normal (ULN); HBsAg level <1250 and ≥1250 IU/mL; and in patients infected with HBV genotype D: age ≤45 and >45 years; HBsAg <2500 and ≥2500 IU/mL; HBV DNA <35,000 and ≥35,000 IU/mL. Inspection of the GAM plots showed that the relationship between the response variables and HBV DNA level in genotype B/C patients (Supplementary Fig. 2A-H), and ALT ratio in genotype D patients was not monotonic (Supplementary Fig. 3A-H); therefore, no cutoffs were selected for further analyses.
Supplementary Figure 2 GAM analysis in patients infected with HBV genotypes B or C: Virological response (HBV DNA <2000 IU/mL) 48 weeks post-treatment by (A) age, (B) ALT ratio, (C) HBsAg and (D) HBV DNA; Combined response (HBV DNA <2000 IU/mL and normal ALT) 48 weeks post-treatment by (E) age, (F) ALT ratio, (G) HBsAg, and (H) HBV DNA
ALT, alanine aminotransferase; BL, baseline; DF, degree of freedom; GAM, generalized additive model; HBsAg, hepatitis B surface antigen; HBV, hepatitis B virus; trt, treatment; ULN, upper limit of normal.
Supplementary Figure 3 GAM analysis in patients infected with HBV genotype D: Virological response (HBV DNA <2000 IU/mL) 48 weeks post-treatment by (A) age, (B) ALT ratio, (C) HBsAg and (D) HBV DNA; Combined response (HBV DNA <2000 IU/mL and normal ALT) 48 weeks post-treatment by (E) age, (F) ALT ratio, (G) HBsAg, and (H) HBV DNA
ALT, alanine aminotransferase; BL, baseline; DF, degree of freedom; GAM, generalized additive model; HBsAg, hepatitis B surface antigen; HBV, hepatitis B virus; trt, treatment; ULN, upper limit of normal.
Among the 157 patients with HBV genotype B/C infection, 57 individuals (36.3%) had a virological response and 46 individuals (29.3%) achieved a combined response at 48 weeks post-treatment. Among the 166 patients with HBV genotype D infection, 26 individuals (15.7%) achieved a virological response and 21 individuals (12.7%) achieved a combined response at 48 weeks post-treatment.
Response rates by baseline characteristics included in the ULR analyses are shown by HBV genotype (B/C and D) and type of response in Fig. 1A-D. The ULR analyses of response rates according to baseline characteristics are shown in Table 2 for patients infected with HBV genotype B/C and in Table 3 for patients infected with HBV genotype D.
Figure 1 Genotype B or C patients (A, B) and genotype D patients (C, D) with a virological response (HBV DNA <2000 IU/mL) and a combined response (HBV DNA <2000 IU/mL and normal ALT) at 48 weeks post-treatment by baseline characteristic. Only factors included in the scoring systems are shown
ALT, alanine aminotransferase; HBsAg, hepatitis B surface antigen; HBV, hepatitis B virus.
Table 2 Univariate and multiple logistic regression analysis of factors associated with virological response (HBV DNA <2000 IU/mL) and combined response (HBV DNA <2000 IU/mL and normal ALT) in patients infected with HBV genotype B or C
Table 3 Univariate and multiple logistic regression analysis of factors associated with virological response (HBV DNA <2000 IU/mL) and combined response (HBV DNA <2000 IU/mL and normal ALT) in patients infected with HBV genotype D
Multivariate logistic regression
For patients infected with HBV genotype B/C, four baseline factors (age, ALT ratio, HBV genotype and HBsAg level) identified by ULR analysis as being predictive for responses at 48 weeks post-treatment (P<0.2) were included in the MLR model selection procedure (backward elimination with P<0.15). For virological response, age, HBV genotype and HBsAg remained significant after the backward selection process (all P<0.01, Table 2), while age, ALT ratio and genotype were retained after the selection process for a combined response (P<0.1, Table 2). The covariates of the final MLR models were selected in ≥60% of the bootstrap samples, which shows the stability of the models, while the ROC-AUCs and Hosmer-Lemeshow test indicate sufficient discrimination and goodness of fit (Table 2, Supplementary Figs. 4, 5). When all four factors were included in the MLR models for both endpoints, the regression coefficients and odds ratios of the significant factors were very similar (data not shown).
Supplementary Figure 4 ROC curves for HBV DNA <2000 IU/mL at 48 weeks post-treatment in genotype B/C patients for final model and for each individual factor in the model
HBV, hepatitis B virus; HBsAg, hepatitis B surface antigen; ROC, receiver operating characteristics.
Supplementary Figure 5 ROC curves for combined response at 48 weeks post-treatment in genotype B/C patients for final model and for each individual factor in the model
ALT, alanine aminotransferase; HBV, hepatitis B virus; ROC, receiver operating characteristics.
For patients infected with HBV genotype D, three baseline factors were retained in the final model for virological response (Table 3): age, HBsAg level and HBV DNA level (all P<0.1); and two factors were retained in the final model for a combined response (Table 3): age and HBsAg level (all P<0.01). Again, the covariates of the final MLR models were selected in ≥60% of the bootstrap samples showing the model's stability, and the ROC-AUCs and Hosmer-Lemeshow test indicate sufficient discrimination and goodness of fit (Table 3, Supplementary Figs. 6, 7). When all three factors were included in the MLR model for a combined response, the regression coefficients and odds ratios of the significant factors were very similar (data not shown).
Supplementary Figure 6 ROC curves for HBV DNA <2000 IU/mL at 48 weeks post-treatment in genotype D patients for final model and for each individual factor in the model
HBV, hepatitis B virus; HBsAg, hepatitis B surface antigen; ROC, receiver operating characteristics.
Supplementary Figure 7 ROC curves for combined response at 48 weeks post-treatment in genotype D patients for final model and for each individual factor in the model
HBsAg, hepatitis B surface antigen; ROC, receiver operating characteristics.
To formulate the scoring systems, baseline factors were ranked according to the regression coefficients, and points were assigned according to the size of the regression coefficient (Tables 2 and 3) using 1 as the unit for 1 point. For example, the regression coefficient for genotype (C vs. B), for virological response (1.1725), and for combined response (0.9366) divided by 1 and rounded to an integer value results in an assigned value of 1 point. In genotype B/C patients, 1 point was assigned for HBsAg <1250 IU/mL and for ALT ≥5 × ULN, although these factors were only significant for one of the two post-treatment response variables. In genotype D patients, the final MLR model for virological response was used to devise the scoring system. The resulting scoring systems used to determine an individual patient's probability of achieving a post-treatment response with PegIFN alfa-2a (Table 4) have a maximum score of 5 for genotype C, 4 for genotype B, and 3 for genotype D.
Table 4 Scoring system for predictive baseline characteristics in patients infected with HBV genotypes B or C, or D
Among patients infected with HBV genotype B or C, the distribution of prediction scores in the patient population (N=157) was: 0 points: 5.1% (n=8); 1 point: 25.5% (n=40); 2 points: 39.5% (n=62); 3 points: 21.7% (n=34); 4 points: 7.6% (n=12); and 5 points: 0.6% (n=1). Predictive scores were grouped as follows: 0-1 (n=48), 2 (n=62) and ≥3 (n=47) points to form groups of patients with a low, moderate and high chance of response.
Among patients infected with HBV genotype D, the distribution of prediction scores in the patient population (N=166) was: 0 points: 27.1% (n=45); 1 point: 50.6% (n=84); 2 points: 15.1% (n=25); and 3 points: 7.2% (n=12). Predictive scores were grouped as follows: 0-1 (n=129), 2 (n=25) and 3 (n=12) points to form groups of patients with a low, moderate and high chance of response.
Response rates according to predictive score
Response rates at 48 weeks post-treatment increased with increasing baseline score in patients infected with HBV genotypes B/C or D. Among 47 patients infected with HBV genotype B/C and scores ≥3, 33 individuals (70.2%) achieved a virological response and 27 individuals (57.4%) achieved a combined response and among 48 patients with scores of 0-1, eight individuals (16.7%) achieved a virological response and six individuals (12.5%) achieved a combined response (Fig. 2A). The performance characteristics of the two scores are provided in Supplementary Table 2 and the response rates by prediction score are shown in Supplementary Table 3 for genotypes B and C separately. The results are consistent across genotypes B and C; for example, for patients with genotype B or C the combined response rate is ≥50% in those with prediction scores ≥3 and ≤25% for those with scores ≤2. Among genotype B patients with a baseline score of 0-1 the NPV for a combined response was 90.9% (30 of 33 patients did not achieve a combined response); and among genotype C patients with a baseline score of 1, the lowest possible for genotype C, the NPV for a combined response was 80.0% (12 of 15 patients did not achieve a combined response).
Figure 2 Virological and combined response rates at 48 weeks post-treatment by baseline prediction score in patients infected with HBV genotypes B or C (A) and D (B), and HBsAg clearance rates in patients infected with HBV genotypes B or C, and D (C). Error bars are 95% confidence intervals
ALT, alanine aminotransferase; HBsAg, hepatitis B surface antigen; HBV, hepatitis B virus.
Supplementary Table 2 Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and positive likelihood ratio for the two prediction scores for genotype B/C and D
Supplementary Table 3 Response rates by prediction score in genotype B or C patients
Post-treatment response rates were lower in patients with HBV genotype D infection, and the proportion of patients with a favorable baseline score (3 points) was low. Among 12 patients with a baseline score of 3, six individuals (50%) achieved a virological response and four individuals (33.3%) achieved a combined response, and among 129 patients with a baseline score of 0-1, 13 individuals (10.1%) achieved a virological response and 10 individuals (7.8%) achieved a combined response (Fig. 2B). Among genotype D patients with a baseline score of 0-1, the NPV for a combined response was 92.2% (119 of 129 patients did not achieve a response); however, for patients with a score of 0, the NPV was 100% (45/45), and for those patients with a score of 1, the NPV was 88.1% (74/84).
HBsAg clearance rates also increased with increasing baseline score (Fig. 2C).
Applicability of on-treatment stopping rule by predictive score
When patients were categorized by baseline score, there were no statistically significant differences in the number of individuals (genotype B/C or D) who met the criterion for discontinuing therapy at Week 12 (Supplementary Table 4).
Supplementary Table 4 Baseline prediction score by PARC rule statusa
Discussion
This analysis shows that HBeAg-negative patients with a high or low chance of achieving a post-treatment response to PegIFN alfa-2a can be identified with genotype-specific baseline scoring systems. The systems employ readily available demographic and laboratory data and could easily be incorporated into routine patient visits to assist clinicians discussing treatment options with patients. With the prediction scores developed here, points are assigned for age, ALT ratio, HBV genotype and HBsAg level for patients with genotype B/C infection, and for age, HBsAg and HBV DNA levels for patients with genotype D infection. Individual patient scores range from 0 to 5 for genotype B/C patients and from 0 to 3 for genotype D patients. Higher scores indicate a greater likelihood of achieving a virological response (HBV DNA <2000 IU/mL) and a combined response (HBV DNA <2000 IU/mL plus normal ALT) 48 weeks after completing a standard 48-week course of treatment with PegIFN alfa-2a. Among genotype B/C patients, patients with scores ≥3 had a high chance (>50%), with a score of 2, a moderate chance (~25%), and with scores of 0 or 1, a low chance (~15%) of achieving a post-treatment response.
A smaller proportion of patients with the more-challenging-to-treat genotype D infection were identified as having a high chance of achieving a post-treatment response. Moreover, the response rates were lower in these individuals than in those with genotype B/C infection. Although genotype D patients are less likely to be suitable candidates for PegIFN alfa-2a, the tool may be most useful in this difficult-to-treat group, because it identifies those few patients who may benefit from PegIFN alfa-2a. In the PegBeLiver study, a total of 12% of patients (6/51) randomized to a standard 48-week course of PegIFN alfa-2a therapy achieved a combined response (which was defined as HBV DNA <3400 IU/mL and normal ALT levels 48 weeks post-treatment); none of these cleared HBsAg [9]. In the present analysis, few patients were identified as having a good chance of a response (baseline score of 3); however, one-third of patients with a baseline score of 3 achieved a combined response at 48 weeks post-treatment. Thus, the results imply that, although overall response rates to PegIFN alfa-2a are quite low in patients with genotype D infection, it is possible to identify the few patients who are most likely to respond on the basis of their baseline characteristics.
The utility of the scoring systems may vary according to genotype. For patients with genotype B infection, who usually have the highest response rates, the score may be most useful to confirm the suitability of patients for peginterferon alfa-2a, whereas, for genotype C and particularly D infection, response rates are generally lower and the score may be most appropriate for excluding patients who may have been considered for peginterferon alfa-2a treatment.
No decline in HBsAg and <2-log10 decline in HBV DNA by Week 12 of treatment with PegIFN alfa (PARC rule) is recommended as a stopping rule because few patients that meet these criteria have achieved a response in clinical trials [25,26]. In this analysis we were not able to correlate the baseline prediction scoring system with the PARC rule.
Strengths of the present analysis include the development of separate scoring systems for patients with genotype B/C and D infection and the use of 48-week post-treatment data in the outcome definition. The ability to identify patients likely to develop off-treatment responses with PegIFN alfa-2a is desirable because it would allow clinicians to target treatment to those patients most likely to respond, while minimizing the number of likely non responders who are exposed to the potential adverse events of treatment.
Limitations include the retrospective nature of the study and the pooled analysis, the relatively small number of patients (especially genotype D patients), the lack of a score for genotype A patients and the lack of an external validation cohort. The lack of ethnic diversity within each genotype group (i.e., all genotype B/C patients were Asian and all genotype D patients were Caucasian) and the lack of comprehensive information on liver disease severity are also limitations. HBV genotyping may not be reimbursed in some countries, so the need for HBV genotype may be seen as a limitation; however, the subanalysis in Asian patients shows that the score can still be used when this factor is not available. The scoring system must be prospectively validated before it can be recommended for use in routine clinical practice, and it would be reassuring to confirm whether patients identified by these scoring systems also have a high probability of experiencing HBsAg clearance or seroconversion. Future analyses should also attempt to evaluate the baseline scoring system prospectively, while applying on-treatment stopping rules. Application of these prediction scores requires that clinicians have knowledge of the HBV genotype and quantitative HBsAg levels. These tests may not be available in all practice settings, which would limit the utility of this tool.
In conclusion, the proposed baseline scoring systems for HBeAg-negative patients infected with HBV genotype B/C and D use readily available baseline characteristics and enable physicians to identify patients with a low, moderate, or high chance of achieving a post-treatment response to PegIFN alfa-2a. The benefit/risk ratio should be carefully considered before initiating treatment in patients with scores of 0-1, given that these indicate a low chance of success.
What is already known:
-
Peginterferon alfa-2a induces durable responses in approximately one-third of hepatitis B e antigen-negative patients with chronic hepatitis B
-
Baseline disease factors, such as higher hepatitis B virus DNA and hepatitis B surface antigen levels, have been associated with a reduced likelihood of response
-
Criteria to identify specific patients likely or unlikely to respond to peginterferon alfa-2a would permit targeting of resources and sparing of treatment for patients unlikely to respond
What the new findings are:
-
For patients with genotype B or C disease, a baseline scoring system from 0 to 5 was developed based on age, genotype, alanine aminotransferase ratio and HBsAg level
-
Only 16.7% of genotype B or C patients with scores of 0 or 1 achieved a virological response, while 70.2% of those with scores of 3-5 responded
-
For patients with genotype D disease, a scoring system from 0 to 3 based on age, hepatitis B surface antigen level and hepatitis B virus DNA level was predictive of response