Mark is joined in this episode of Drill to Detail by Wes McKinney, to talk about the origins of the Python Pandas open-source package for data analysis and his subsequent work as a contributor to the Kudu (incubating) and Parquet projects within the Apache Software Foundation and Arrow, an in-memory data structure specification for use by engineers building data systems and the de-facto standard for columnar in-memory processing and interchange.
- Python Data Analysis Library
- "Ibis on Impala: Python at Scale for Data Science"
- Drill To Detail Ep.3 'Apache Kudu And Cloudera's Analytic Platform' With Special Guest Mike Percy
- Apache Arrow homepage
- "Apache Arrow and the "10 Things I Hate About pandas"
- "Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation?"
- "Some comments to Daniel Abadi's blog about Apache Arrow"
- Wes McKinney homepage