We built our methods by following or adapting the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)  and the Cochrane Handbook .
Criteria for considering systematic reviews for Epistemonikos Database
In accordance with the Cochrane Collaboration and the PRISMA Statement [11, 12], we have adopted the following definition: ‘A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria to answer a specific research question. It uses explicit systematic methods that are selected with a view to minimising bias, thus providing reliable findings from which conclusions can be drawn and decisions made’ .
The operational criteria to consider a systematic review for inclusion in Epistemonikos Database are:
Its main purpose is to synthesise primary studies.
It states at least one explicit eligibility criterion.
It reports searching in at least one electronic database.
Additionally, we include any synthesis of primary studies that do not fulfil the above definition but is judged to add valuable information, such as individual patient or unpublished data meta-analysis where studies have not been identified through a systematic search process.
Evidence-syntheses that fulfil criteria 3 but not all of the above are not excluded from Epistemonikos Database but are classified under a different category (i.e. broad synthesis plus a specific subtype, such as guideline, overview of systematic reviews), which is not the subject of this article.
We exclude reviews that:
Do not address a human health problem.
Synthesise studies that do not evaluate individuals or groups of individuals (e.g. preclinical or animal studies).
Explore a methodological issue (i.e. research about research).
Are only presented as conference abstracts.
Search methods for identification of systematic reviews
Epistemonikos was developed and is maintained by systematically searching 10 databases in a daily or weekly basis: Cochrane Database of Systematic Reviews, Pubmed/MEDLINE, EMBASE, CINAHL, PsycINFO, LILACS, DARE, Campbell library, JBI Database of Systematic Reviews and Implementation Reports and EPPI-Centre Evidence Library.
We do not restrict our search by language, publication status or publication date (i.e. databases have been searched from inception). In the case of databases of structured summaries (i.e. DARE database), we retrieve the article being summarised and assess it using the same inclusion criteria.
The search strategies were pragmatically adapted from previously reported strategies to retrieve systematic reviews  and improved by a team of search experts who analysed the search terms obtained from the text mining of relevant and irrelevant records.
The detailed search strategies currently used in Epistemonikos Database are described in additional file 1.
In order to identify systematic reviews potentially missed by our search in electronic databases we:
Include systematic reviews identified in overviews of reviews, guidelines, scoping reviews or other types of broad syntheses (which are also included in Epistemonikos Database but are classified under a different category).
Check references of selected included reviews.
Run cross-citation searches in Google Scholar and Microsoft Academic.
Evaluate potentially eligible reviews sent by users through the contact page or other means (e.g. email, twitter).
Data collection and analysis
Selection of reviews
The selection is conducted in two steps. First, all potentially eligible articles are classified as they enter the database using automated methods specifically created for this project (a machine learning classifier for the records with an abstract and a heuristic classifier for the records without an abstract). Secondly, a collaborative network of Epistemonikos users validates this classification. Records with a high probability of being false positives or false negatives are regularly checked by a dedicated team of method experts.
Development of the classifier for records with an abstract
The dataset used to develop the classifier includes all the records with an abstract that had been manually screened by at least one reviewer by January 2019. This dataset was formed by 102,011 systematic reviews and 42,321 records not corresponding to systematic reviews, most of them classified before 2016 when Epistemonikos Database selection process was conducted only by human screeners (earlier versions of the classifier had been in use during 2017 and 2018).
The dataset was arbitrarily divided into two splits as training and validation (80 and 20% respectively). The training split was used to build a classifier using a supervised learning random forest and the validation split was used to test its predictive power . The terms composing the classifier were iteratively analysed and improved by a team of software engineers with expertise in information retrieval, methodologists and information specialists until reaching a stable version. Finally, the results were manually validated with a set of 500 unseen records to make sure we had not overfit the model during the tuning of the random forest model.
Development of the classifier for records without an abstract
Acknowledging the limitations of any classifier using a language-based technique to manage records without an abstract, we approached these as a separate problem. We reviewed the sample iteratively to identify characteristics associated with a high probability of being or not being a systematic review and custom-built a heuristic classifier using specific terms and other characteristics of the records.
Automated classification (classifiers)
All the records retrieved by the search strategy are immediately processed and automatically classified into included/excluded systematic reviews in the database. Later, the classifications are manually validated by at least one human screener.
All the titles and abstracts included by the classifier are uploaded to Collaboratron™ , a screening software specifically developed by Epistemonikos Foundation for this purpose. The documents are screened by at least one human using this tool, starting from the most recent records. The records without an abstract are regularly reviewed by a dedicated team. The full text of the article is retrieved if it is not possible to make a decision based on the title or abstract.
Discrepancies between the classifier and a human screener (i.e. included by the classifier and excluded by the human screener) or between different human screeners are resolved by a senior researcher.
Measures of performance
In order to estimate the performance of the classifiers, we used the validation set as a gold standard for the machine learning classifier and a convenience sample of 500 unseen records for the heuristic classifier. We calculated the following measures (and their 95% confidence interval): sensitivity or recall (true positives/(true positives + false negatives)), precision or positive predictive value (true positives/(true positives + false positives)), specificity (true negatives/(false positives + true negatives)) and accuracy ((true positives + true negatives)/total). For estimation of misclassified reviews in Epistemonikos Database, we applied these numbers to the total amount of records without human validation.