Processing Large XML Wikipedia Dumps that won't fit in RAM in Python without Spark

The Python ElementTree object allows you to read any sized XML that you have time to process. Unlike a DOM the entire XML document does not need to be loaded. This video shows how the entire of Wikipedia can be processed without a large amount of RAM in Python.

My blog post for this video:

https://www.heatonresearch.com/2017/03/03/python-basic-wikipedia-parsing.html

The code for this video can be found here:

https://github.com/jeffheaton/present/blob/master/youtube/read_wikipedia.ipynb

Source of this machine learning/AI Video

AI video(s) you might be interested in …