The 30M Factoid Question-Answer Corpus consists of 30M natural language questions in English and their corresponding facts in the knowledge base Freebase.
The dataset is formatted as a text file, where each line contains:
where <subject>, <relationship> and <object> are the subject, relationship and object identifier in Freebase corresponding to the natural language question.
For a more detailed description, have a look at our paper:
30M QA – Part1 (~300MB)
30M QA – Part2 (~200MB)