Skip to main content

Table 1 Two sets of variables from SEER’s research dataset

From: Generation and evaluation of synthetic patient data

Feature set

Variables

small-set

AGE_DX, BEHO3V, DX_CONF, GRADE, LATERAL, PRIMSITE ∗, SEQ_NUM, SEX

large-set

AGE_DX, BEHO3V, CS1SITE, CS2SITE, CS3SITE, CS4SITE †, CS5SITE †, CS6SITE †, CS7SITE †∘, CS15SITE †∘, CSEXTEN, CSLYMPHN, CSMETSDX, CSMETSDXBR_PUB, CSMETSDXB_PUB, CSMETSDXLIV_PUB, CSMETSDXLUNG_PUB, CSMTEVAL, CSRGEVAL, CSTSEVAL, CSVCURRENT, CSVFIRST, DX_CONF, GRADE, HISTO3V, LATERAL, MAR_STAT, NHIADE, NO_SURG, PRIMSITE, RACE1V, REC_NO, REG, REPT_SRC, SEQ_NUM, SEX, SURGSITF, TYPE_FU, YEAR_DX, YR_BRTH

  1. small-set contains variables with low levels; while large-set contains a large number of variables including a few with large number of levels. ∗variable PRIMSITE is only considered for BREAST dataset as it has a large number of levels for LYMYLEUK and RESPIR. † indicates that the variable is not present in the LYMYLEUK dataset. ∘ indicates that the variable is not present in the RESPIR dataset