Skip to content

Latest commit

 

History

History
57 lines (33 loc) · 5.31 KB

File metadata and controls

57 lines (33 loc) · 5.31 KB

Python

Python is the primary programming language we use in the AI/ML community, so we will need to get that appropriately installed to our computer. This page will cover that along with how to install custom libraries.

Installation

(Comic sourced from XKCD's website (Link))

Unfortunately, Python can be one of the most maddenly frustrating pieces of software to install. Part of the challenge is that there are too many ways to install Python, and each way has a tendency to trip over each other, as the comic above indicates. Here are just a few ways that we can install Python

  • It apparently comes packaged with XCode developer tools, so if you followed the CLI guide as a Mac user, you probably already have 1 instance of Python now installed on your Mac.
  • You can download an "official" installer from the official Python website.
  • Windows users can install instances of Python from the Microsoft App Store.
  • You can use special package managers like Mac's Homebrew or Windows' Chocolately to install Python.

It's frankly a freakin mess. You can scour the internet and not find a common consensus what is the best method of installation!

In my opinion, the cleanest way you can go is via that second bullet: downloading the Python installer from the official Python website. This option works for both Mac and Windows users. Simply select the desired version and download the installer.

Note: We've mentioned a few times now that I do not recommend the latest version of Python at any given time. Generally speaking, I've found that third party libraries need time to catch up to each new big Python update, so you generally can't go wrong by staying one version behind. As of this guide's latest update (Q1 2024), the latest version of Python is 3.12, so I would advise installing Python 3.11.

You can verify your installation of Python by running the following command:

python3 --version

This should display the correct version of Python that you installed.

Installing Third Party Libraries

Think about the smartphone in your pocket. When you first purchased it, it probably didn't come with much on it. Maybe a few default apps like phone, maps, or messages. This is fine for baseline stuff, but if you want to do more, then you've had to install third party apps. These include apps like Uber, Instagram, and whatever else you've installed.

Python is very much in the same vain. While Python itself has some default things, we developers have to add third party libraries to really make the most of Python. This is a very common thing to do!

New Python libraries are registered with the Python Package Index (PyPI) and are installed using the pip3 command. If you're wondering how to install pip3, you just did it without knowing! pip3 is bundled alongside any installation of Python. So let's say you want to install the Pandas library (which we'll cover down below). You would simply need to run the following command:

pip3 install pandas

And if you wanted a very specific version of a library, you could set it like the following:

pip3 install pandas==2.0.1

Keep in mind, the versions of software that you use are very important to be cognizant of!! This is because software gets updated over time, so to ensure that your local computer's working version stays in sync with something that might go out to production, you'll want to ensure that these versions are pinned appropriately.

Recommended Third Party Libraries

The following list is certainly not exhaustive, but these are very common libraries used within the AI/ML community:

  • numpy: Numpy is a library that does very efficient handling of numbers and such in a way that base-level Python simply isn't good at doing. Numpy is found at the core of many other libraries, too, including the important next one.
  • pandas: Pandas is a library that allows us to interact with our data in a lot of different ways. Beyond simple views and aggregations, you can also slice and dice the data in various different ways. Pandas is extremely important to the AI/ML community, specifically as many machine learning algorithms are set up specifically to interact with Pandas dataframes. Learning Pandas is a skill in and of itself, and you can find a great, free introductory course to Pandas at this link.
  • matplotlib: This Python library is great for creating visualizations on your data. It is often used in tandem with another library called seaborn, which is used to "prettify" the graphs generated with matplotlib.
  • scikit-learn: This is one of the most popular base level machine learning libraries in Python. In addition to having various machine learning algorithms, it also has functionality to do things like model drift calculations.
  • torch: Also known as PyTorch, this is the major Python library that most people use today to build deep learning neural network architectures.
  • transformers: Maintained by the good folks at HuggingFace, this library is absoutely essential to anybody interested in working in the natural language processing (NLP) space, including with large language models (LLMs).
  • datasets: Another HuggingFace library, this Python client does just want it sounds: it allows us to download datasets from the HuggingFace Hub.