Some familiarity with programming concepts (in any language) will be beneficial, but prior programming experience is not required.
By the end of the course, you will have all the knowledge you need to start using Python competently for automating various processes involving analysis, modelling, visualisation of various kinds of data. You will have had experience with using Python for various practical data-manipulation tasks with data in a variety of formats, including CSV, Excel spreadsheets, and SQL databases. You will have applied powerful tools for clustering, classification, regression, and optimisation, in useful practical settings on small and large data sets. You will understand the elegance and power of the Python language and its powerful ecosystem of packages for data analysis, and you will be well- placed to continue learning more as you use it day-to-day.
Day 1: Python basics
Day 1 covers how to use Python for basic scripting and automation tasks, including tips and tricks for making this easy. The syllabus is as follows:
- Why use Python for predictive analytics? What’s possible? Python versus Java, C#, R, Matlab ...
- Setting up your Python development environment (IDE, IPython notebook)
- Python syntax and concepts: an introduction through examples
- Variables, values and operators
- Essential data structures: strings, tuples, lists
- Input and output of text data (including CSV files)
- String methods
- Raising and handling exceptions
Day 2: Further Python essentials
Day 2 introduces further important concepts for real-world scripting in Python. The syllabus is as follows:
- Further important data structures: dictionaries and sets, and their applications
- Modules and packages
- Tour of the amazing standard library, including:
- Handling CSV files
- Handling dates and times
- Fetching data from the web
- Compressing and uncompressing data
Day 3: Essential analytic tools and data formats
The Pandas package is an amazingly productive tool for working with and analysing data in Python. Day 2 gives a thorough introduction to Pandas and related tools for working with different kinds of data, including spreadsheets, time-series data, and SQL databases. The syllabus is:
- Fast, powerful data analysis with Pandas
- Working with time-series data
- Working with missing and noisy data
- Reading and writing data: CSV, Excel, SQL databases, JSON, and spatial formats
- Indexing, grouping, merging, reshaping, summarising data
- Statistical graphics and visualisation of data using Pandas, Matplotlib, and Seaborn
Day 4: Machine Learning
Day 3 introduces three of the most fundamental and powerful techniques for analysing many kinds of real-world data in Python. The datasets are selected from a range of industries: financial, geospatial, medical, and social sciences. The syllabus is:
- Linear and nonlinear regression with statsmodels and scikit-learn, with application to quality assessment and forecasting
- Clustering of data using scikit-learn, with application to outlier detection
- Classification with scikit-learn, with application to diagnosis and prediction
We will supply you with printed course notes and a USB stick containing a complete Python environment based on VirtualBox. This saves time in the course and allows us to focus on using Python rather than installing it. The USB stick also contains kitchen-sink Python installers for multiple platforms, solutions to the programming exercises, several written tutorials, and reference documentation on Python and the third-party packages covered in the course.
Your trainer(s) will be available after the course each day for you to ask any one-on-one questions you like — whether about the course material and exercises or about specific problems you face in your work and how to use Python to solve them.
Food and drink:
We will provide lunch, morning and afternoon tea, and drinks.
The course will run from 9:00 to roughly 17:00 each day, with a breaks of an hour for lunch and 15 minutes each for morning and afternoon tea.