30M Factoid Question-Answer Corpus

The 30M Factoid Question-Answer Corpus consists of 30M natural language questions in English and their corresponding facts in the knowledge base Freebase.

The dataset is formatted as a text file, where each line contains:

    <subject> \t <relationship> \t <object> \t natural language question,
 

where <subject>, <relationship> and <object> are  the subject, relationship and object identifier in Freebase corresponding to the natural language question.

 

For a more detailed description, have a look at our paper:

 Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus


30M QA – Part1 (~300MB)

30M QA – Part2 (~200MB)

License