Multiple Sclerosis Discovery -- Episode 83 with Dr. Jerry Wolinsky

Update: 2016-05-17

Description

[intro music]

Host – Dan Keller

Hello, and welcome to Episode Eighty-three of Multiple Sclerosis Discovery, the podcast of the MS Discovery Forum. I’m Dan Keller.

For years, MS researchers have been looking for a measure of MS progression and disability that would be meaningful to clinicians, clinical researchers, patients, and the regulatory agencies that approve new drugs, such as the Food and Drug Administration. To this end, people have looked to composite endpoints that are sensitive to small changes in patient condition and comparable across studies. At the ECTRIMS conference last fall in Barcelona, I met with Dr. Jerry Wolinsky, professor of neurology and director of the MS Research Group at the University of Texas Health Science Center at Houston, who leads us along the path to develop a useful measure incorporating composite endpoints.

Interviewer – Dan Keller

In terms of assessing progression and disability in MS, is there some advantage to having composite endpoints as opposed to the standard tests we’ve looked at?

Interviewee – Jerry Wolinsky

There are several different ways to think about composite endpoints. So one of the things that was introduced almost several decades ago was MSFC functional composite. So this was using three different ways of looking at different components of disability in patients with MS. One was a test of cognition. One was a test of fine motor skills in the upper extremities. And one was a test of walking abilities/walking speed. That particular composite looked very attractive. There was a fair amount of theoretical and practical work behind instituting the composite, and it was used in a number of trials. And it was based on some very important, I think, kind of statistical analysis.

So what it allowed one to do was to take patients either in a given study or across studies and try to normalize the data that you would get from those patients into something called a z-score, which is a way of ranking and evaluating how far across the group of patients people were scattered. And then one could conceptually add up the z-scores and have a composite number, and a single number that you could use to analyze trial data. It seemed to be rather sensitive, and it seemed to work well. But the z-score is very dimensionless, and it makes little sense to the practicing clinician, or certainly to patients, to know that you’re minus-two or minus-five or plus-two, and that maybe this has moved by two-hundredths of a point from the time you started in the study until you got to the end of the study.

So, highly sensitive, seemed very reproducible, maybe even a way to look across studies at different results, but neither patients or physicians and, most importantly, the FDA thought that this would be useful in day-to-day practice. So, while we’ve tested that kind of approach in multiple studies, it just hasn’t worked. But it did set up the notion that we could get a little bit more quantitative in things that could be useful on a daily basis, even using some of the same components of that MSFC.

So instead of thinking about how fast could one person walk compared to another, we said, how fast can a person walk using a timed walk of a fixed distance and at one point in time? And then say how much change over an interval of time would represent something that was likely to be reproducible and, more importantly, likely to be correlated with some measure of quality of life that also was deteriorating?

So then we got to the notion–and this was really best utilized thus far in the trials of 4-aminopyridine in terms of registration studies there–to say could you show a 20% improvement or more in this timed walk over an interval of time? And in that study, a certain number of patients were able to show it, and there was also some correlative data done to show that that amount of improvement correlated with things which were meaningful to the individual. And so I think that helped facilitate getting that drug through the registration process with the FDA.

One of the things that my colleagues and I did in looking at one of the trials in progressive disease, specifically the trial of rituximab in primary progressive MS, where we had the data that goes into the MSFC, because it had been collected in the study, was to try to develop a number of different composites. And actually, when you think about it, the main score that we use to rate studies is the EDSS score, and it itself is a composite. It takes into account graded changes in fine motor skills in what we would call the cerebellar system, in the pyramidal system, in the sensory systems, and cognitive systems. It’s just that the boundaries in moving in these individual functional scales are a little bit more subjective in terms of going from a zero to a one, two, or three. And then the scale itself is rather complicated in terms of how it put together to come to the final score, the extended disability status score. But it’s very well accepted by neurologists, and it’s accepted by the regulatory authorities as the standard.

So we took our standard changes on EDSS, which in this particular study had not shown efficacy across the group as a whole. So we looked at that in the placebo arm, and didn’t contaminate that with the treated arm, to say what was the rate of change on the EDSS alone? But then we also said, what about a 20% change over baseline that had occurred in an individual patient over intervals of testing and not just one that occurred at a particular setting compared to baseline, but one that continued to be seen at the next 3 months and the next 3 months. So it looked like it was a sustained change in the same way that we use EDSS now in trials to talk about sustained or accumulated permanent disability, at least over some interval of time.

So we said, okay, we can construct a progression curve based on that. And then we said, what does that look like? And said, well, this has some dimensions to it that are interesting. And we did the same thing with the Timed 25-Foot Walk, and we didn’t fool around with the PASAT [Paced Auditory Serial Addition Test] the cognitive measure because nobody likes it. Patients don’t appreciate it, and it’s a rather prolonged and not a simple test to use. And this is one that probably could be easily changed out with other cognitive tests that are probably as reliable and easier to complete. And we looked at how did patients progress using that change in the timed walk and said, well, that’s interesting too.

And then we went into the group as a whole and said, okay, how many patients changed on the EDSS over three months, confirmed? How many over six months, confirmed? How many did this on the Timed 25-Foot Walk? Did it cross the 20% threshold? How many did this on the 9-Hole Peg Test and, again, crossing the 20% threshold? And who were these patients, more importantly? So then we could develop series of Venn diagrams–if you will, circles–that showed who did it on just one test, who did it on all tests, who did it on two tests? And looked to see could we get a larger and larger proportion of the population that were showing progression?

And the answer is: We could. And for some tests, the incremental change was small, and for other tests the incremental change was relatively large. But when we looked at the results of the study, then, using different kinds of composites, you fail just on EDSS; you fail on EDSS, or you fail on Timed 25-Foot Walk; you fail on Timed 25-Foot Walk or 9-Hole Peg Test—we don’t care about EDSS in that one—you fail on all three. We could see that we could increase the sensitivity, that is, the number of people who were showing progression, using these kinds of composites, and hoped, therefore, that we could increase the sensitivity to drug effect.

So then we did the next step, which was to take both the placebo arm and the treated arms and say, okay, how did the curves change? So the overall curve showed no statistical benefit with the EDSS, until you went to subgroup analysis. And that was reported in the original paper. But when we modeled this, of course, the overall didn’t show the statistical effect. That’s where we were starting from. When we added in the Timed 25-Foot Walk, it looked like there was a better split. In fact, the effect size for the treatment improved. And this was not across subgroups, but across the entire population.

Interestingly enough, we probably got the biggest punch by throwing out the EDSS and just using the 9-Hole Peg Test and the Timed 25-Foot Walk. That has some advantages, because they can be done by anyone. In fact, they probably could be done remotely, or we probably could convert it to how many steps a day did you take and have your watch feed the message to us over the course of a day. There are a number of interesting different approaches that can be taken to this kind of concept, and some of these are being pursued by a collaborative group spearheaded through the NIH, as well as a private consortium, looking at newer ways to measure progression.

The good news is, I’m sure we’ll find things that are more sensitive. The good news is, I’m sure we’ll find things that are easier to apply. Another part of the good news is that the additional work increasingly is carried out with some representatives from the regulatory authorities to give us a feeling for what they really want to see. And what they would like to see is not just that we have composites that are sensitive and reproducible, but each of those composites that, before using them, has been shown to have some relevance for what patients complain of and what patients are looking for. So that’s the good news.

The bad news is we have to not only develop them, validate them, show that they work, we’ll probably have to constantly be comparing them back