## Abstract

### Background

Response-adaptive randomizations are able to assign more patients in a comparative clinical trial to the tentatively better treatment. However, due to the adaptation in patient allocation, the samples to be compared are no longer independent. At large sample sizes, many asymptotic properties of test statistics derived for independent sample comparison are still applicable in adaptive randomization provided that the patient allocation ratio converges to an appropriate target asymptotically. However, the small sample properties of commonly used test statistics in response-adaptive randomization are not fully studied.

### Methods

Simulations are systematically conducted to characterize the statistical properties of eight test statistics in six response-adaptive randomization methods at six allocation targets with sample sizes ranging from 20 to 200. Since adaptive randomization is usually not recommended for sample size less than 30, the present paper focuses on the case with a sample of 30 to give general recommendations with regard to test statistics for contingency tables in response-adaptive randomization at small sample sizes.

### Results

Among all asymptotic test statistics, the Cook's correction to chi-square test (*T*
_{
MC
}) is the best in attaining the nominal size of hypothesis test. The William's correction to log-likelihood ratio test (*T*
_{
ML
}) gives slightly inflated type I error and higher power as compared with *T*
_{
MC
}, but it is more robust against the unbalance in patient allocation. *T*
_{
MC
}and *T*
_{
ML
}are usually the two test statistics with the highest power in different simulation scenarios. When focusing on *T*
_{
MC
}and *T*
_{
ML
}, the generalized drop-the-loser urn (GDL) and sequential estimation-adjusted urn (SEU) have the best ability to attain the correct size of hypothesis test respectively. Among all sequential methods that can target different allocation ratios, GDL has the lowest variation and the highest overall power at all allocation ratios. The performance of different adaptive randomization methods and test statistics also depends on allocation targets. At the limiting allocation ratio of drop-the-loser (DL) and randomized play-the-winner (RPW) urn, DL outperforms all other methods including GDL. When comparing the power of test statistics in the same randomization method but at different allocation targets, the powers of log-likelihood-ratio, log-relative-risk, log-odds-ratio, Wald-type Z, and chi-square test statistics are maximized at their corresponding optimal allocation ratios for power. Except for the optimal allocation target for log-relative-risk, the other four optimal targets could assign more patients to the worse arm in some simulation scenarios. Another optimal allocation target, *R*
_{
RSIHR
}, proposed by Rosenberger and Sriram (*Journal of Statistical Planning and Inference*, 1997) is aimed at minimizing the number of failures at fixed power using Wald-type Z test statistics. Among allocation ratios that always assign more patients to the better treatment, *R*
_{
RSIHR
}usually has less variation in patient allocation, and the values of variation are consistent across all simulation scenarios. Additionally, the patient allocation at *R*
_{
RSIHR
}is not too extreme. Therefore, *R*
_{
RSIHR
}provides a good balance between assigning more patients to the better treatment and maintaining the overall power.

### Conclusion

The Cook's correction to chi-square test and Williams' correction to log-likelihood-ratio test are generally recommended for hypothesis test in response-adaptive randomization, especially when sample sizes are small. The generalized drop-the-loser urn design is the recommended method for its good overall properties. Also recommended is the use of the *R*
_{
RSIHR
}allocation target.