Fig. 5From: Reliability in evaluator-based tests: using simulation-constructed models to determine contextually relevant agreement thresholdsThis model was generated from the results of a trained evaluator analyzing a separate instance of the block moving task and used to gauge the performance of the human evaluators on the original video analysis task. The model is similar to the original. This suggests that a single model is suitable for generalization to other instances of this video analysis task and model generation only requires a typical datasetBack to article page