...

Data Visualization Explained (Part 4): A Review of Python Essentials


in my data visualization series. See the following:

Up to this point in my data visualization series, I have covered the foundational elements of visualization design. These principles are essential to understand before actually designing and building visualizations, as they ensure that the underlying data is done justice. If you have not done so already, I strongly encourage you to read my previous articles (linked above).

At this point, you are ready to start building visualizations of our own. I will cover various ways to do so in future articles—and in the spirit of data science, many of these methods will require programming. To ensure you are ready for this next step, this article will consist of a brief review of Python essentials, followed by a discussion of their relevance to coding data visualizations.

The Basics—Expressions, Variables, Functions

Expressions, variables, and functions are the primary building blocks of all Python code—and indeed, code in any language. Let’s take a look at how they work.

Expressions

An expression is a statement which evaluates to some value. The simplest possible expression is a constant value of any type. For instance, below are three simple expressions: The first is an integer, the second is a string, and the third is a floating-point value.

7
'7'
7.0

More complex expressions often consist of mathematical operations. We can add, subtract, multiply, or divide various numbers:

3 + 7
820 - 300
7 * 53
121 / 11
6 + 13 - 3 * 4

By definition, these expressions are evaluated into a single value by Python, following the mathematical order of operations outlined by the acronym PEMDAS (Parentheses, Exponents, Multiplication, Division, Addition, Subtraction) [1]. For example, the final expression above evaluates to the number 7.0. (Do you see why?)

Variables

Expressions are great, but they aren’t super useful by themselves. When programming, you usually need to save the value of certain expressions so that you can use them in later parts of our program. A variable is a container which holds the value of an expression and lets you access it later. Here are the exact same expressions as in the first example above, but this time with their value saved in various variables:

int_seven = 7
text_seven = '7'
float_seven = 7.0

Variables in Python have a few important properties:

  • A variable’s name (the word to the left of the equal sign) must be one word, and it cannot start with a number. If you need to include multiple words in your variable names, the convention is to separate them with underscores (as in the examples above).
  • You do not have to specify a data type when we are working with variables in Python, as you may be used to doing if you have experience programming in a different language. This is because Python is a dynamically typed language.
  • Some other programming language distinguish between the declaration and the assignment of a variable. In Python, we just assign variables in the same line that we declare them, so there is no need for the distinction.

When variables are declared, Python will always evaluate the expression on the right side of the equal sign into a single value before assigning it to the variable. (This connects back to how Python evaluates complex expressions). Here is an example:

yet_another_seven = (2 * 2) + (9 / 3)

The variable above is assigned to the value 7.0, not the compound expression (2 * 2) + (9 / 3).

Functions

A function can be thought of as a kind of machine. It takes something (or multiple things) in, runs some code that transforms the object(s) you passed in, and outputs back exactly one value. In Python, functions are used for two primary reasons:

  1. To manipulate input variables of interest and come up with an output we need (much like mathematical functions).
  2. To avoid code repetition. By packaging code inside of a function, we can just call the function whenever we need to run that code (as opposed to writing the same code again and again).

The easiest way to understand how to define functions in Python is to look at an example. Below, we have written a simple function which doubles the value of a number:

def double(num):
    doubled_value = num * 2
    return doubled_value

print(double(2))    # outputs 4
print(double(4))    # outputs 8

There are a number of important points about the above example you should ensure you understand:

  • The def keyword tells Python that you want to define a function. The word directly after def is the name of the function, so the function above is called double.
  • After the name, there is a set of parentheses, inside which you put the function’s parameters (a fancy term which just mean the function’s inputs). Important: If your function does not need any parameters, you still need to include the parentheses—just don’t put anything inside them.
  • At the end of the def statement, a colon must be used, otherwise Python will not be happy (i.e., it will throw an error). Together, the entire line with the def statement is called the function signature.
  • All of the lines after the def statement contain the code that makes up the function, indented one level inward. Together, these lines make up the function body.
  • The last line of the function above is the return statement, which specifies the output of a function using the return keyword. A return statement does not necessarily need to be the last line of a function, but after it is encountered, Python will exit the function, and no more lines of code will be run. More complex functions may have multiple return statements.
  • You call a function by writing its name, and putting the desired inputs in parentheses. If you are calling a function with no inputs, you still need to include the parentheses.

Python and Data Visualization

Now then, let me address the question you may be asking yourself: Why all this Python review to begin with? After all, there are many ways you can visualize data, and they certainly aren’t all restricted by knowledge of Python, or even programming in general.

This is true, but as a data scientist, it is likely that you will need to program at some point—and within programming, it is exceedingly likely the language you use will be Python. When you’ve just been handed a data cleaning and analysis pipeline by the data engineers on your team, it pays to know how to quickly and effectively turn it into a set of actionable and presentable visual insights.

Python is important to know for data visualization generally speaking, for several reasons:

  • It is an accessible language. If you are just transitioning into data science and visualization work, it will be much easier to program visualizations in Python than it will be to work with lower-level tools such as D3 in JavaScript.
  • There are many different and popular libraries in Python, all of which provide the ability to visualize data with code that builds directly on the Python basics we learned above. Examples include Matplotlib, Seaborn, Plotly, and Vega-Altair (previously known as just Altair). I will explore some of these, especially Altair, in future articles.
  • Furthermore, the libraries above all integrate seamlessly into pandas, the foundational data science library in Python. Data in pandas can be directly incorporated into the code logic from these libraries to build visualizations; you often won’t even need to export or transform it before you can start visualizing.
  • The basic principles discussed in this article may seem elementary, but they go a long way toward enabling data visualization:
    • Computing expressions correctly and understanding those written by others is essential to ensuring you are visualizing an accurate representation of the data.
    • You’ll often need to store specific values or sets of values for later incorporation into a visualization—you’ll need variables for that.
      • Sometimes, you can even store entire visualizations in a variable for later use or display.
    • The more advanced libraries, such as Plotly and Altair, allow you to call built-in (and sometimes even user-defined) functions to customize visualizations.
    • Basic knowledge of Python will enable you to integrate your visualizations into simple applications that can be shared with others, using tools such as Plotly Dash and Streamlit. These tools aim to simplify the process of building applications for data scientists who are new to programming, and the foundational concepts covered in this article will be enough to get you started using them.

If that’s not enough to convince you, I’d urge you to click on one of the links above and start exploring some of these visualization tools yourself. Once you start seeing what you can do with them, you won’t go back.

For my part, I’ll be back in the next article to present my own tutorial for building visualizations. (One or more of these tools may make an appearance.) Until then!

References

Source link

#Data #Visualization #Explained #Part #Review #Python #Essentials