In this short guide, you’ll see how to compare values between two Pandas DataFrames. You’ll also observe how to compare values from two imported files. Show
Step 1: Prepare the datasets to be comparedTo start, let’s say that you have the following two datasets that you want to compare: First Dataset:
Second Dataset:
The ultimate goal is to compare the prices (i.e., price_1 vs. price_2). Step 2: Create the two DataFramesBased on the above data, you can then create the following two DataFrames using this code: import pandas as pd data_1 = {'product_1': ['computer','monitor','printer','desk'], 'price_1': [1200,800,200,350] } df1 = pd.DataFrame(data_1) print(df1) data_2 = {'product_2': ['computer','monitor','printer','desk'], 'price_2': [900,800,300,350] } df2 = pd.DataFrame(data_2) print (df2) Run the code in Python, and you’ll get these two DataFrames:
Step 3: Compare the values between the two Pandas DataFramesIn this step, you’ll need to import the NumPy package. You can then use this template to perform the comparison: df1['new column that will contain the comparison results'] = np.where(condition,'value if true','value if false') For our example, here is the syntax that you can add in order to compare the prices (i.e., price_1 vs. price_2) between the two DataFrames: df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False') You’ll notice that a new column (i.e., the ‘prices_match’ column) would be created under the first DataFrame (i.e., df1). This new column would contain the comparison results based on the following rules:
Here is the complete Python code that you can use to compare the prices between the two DataFrames: import pandas as pd import numpy as np data_1 = {'product_1': ['computer','monitor','printer','desk'], 'price_1': [1200,800,200,350] } df1 = pd.DataFrame(data_1) data_2 = {'product_2': ['computer','monitor','printer','desk'], 'price_2': [900,800,300,350] } df2 = pd.DataFrame(data_2) df1['price_2'] = df2['price_2'] #add the price_2 column from df2 to df1 df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False') #create a new column in df1 to check if prices match print(df1) Run the code, and you’ll get the following price comparison:
Note that in the above code, the price_2 column from the second DataFrame was also added to the first DataFrame in order to get a better view when comparing the prices. Now what if you want to find the actual differences between the two prices? (price_1) – (price_2) In that case, you may add the following syntax to your code: df1['price_diff'] = np.where(df1['price_1'] == df2['price_2'], 0, df1['price_1'] - df2['price_2']) So the complete Python code would look like this: import pandas as pd import numpy as np data_1 = {'product_1': ['computer','monitor','printer','desk'], 'price_1': [1200,800,200,350] } df1 = pd.DataFrame(data_1) data_2 = {'product_2': ['computer','monitor','printer','desk'], 'price_2': [900,800,300,350] } df2 = pd.DataFrame(data_2) df1['price_2'] = df2['price_2'] #add the price_2 column from df2 to df1 df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False') #create a new column in df1 to check if prices match df1['price_diff'] = np.where(df1['price_1'] == df2['price_2'], 0, df1['price_1'] - df2['price_2']) #create a new column in df1 for price diff print(df1) Once you run the code, you’ll get the actual differences between the prices:
Compare Values from two Imported FilesLastly, you’ll see how to compare values from two imported files. Let’s say that you have the following data stored in a CSV file called file_1:
While you have the data below stored in a second CSV file called file_2:
You can then import the above files into Python. In order to do so, you’ll need to specify the paths where the CSV files are stored on your computer. For demonstration purposes, let’s assume that the paths of the files are:
Once you imported the CSV files into Python, you’ll be able to assign each file into a DataFrame, where:
As before, the goal is to compare the prices (i.e., price_1 vs. price_2). So here is the complete Python code to compare the values from the two imported files: import pandas as pd import numpy as np data_1 = pd.read_csv(r'C:\Users\Ron\Desktop\Test\file_1.csv') df1 = pd.DataFrame(data_1) data_2 = pd.read_csv(r'C:\Users\Ron\Desktop\Test\file_2.csv') df2 = pd.DataFrame(data_2) df1['price_2'] = df2['price_2'] df1['prices_match'] = np.where(df1['price_1'] == df2['price_2'], 'True', 'False') df1['price_diff'] = np.where(df1['price_1'] == df2['price_2'], 0, df1['price_1'] - df2['price_2']) print(df1) Once you run the code in Python (adjusted to your paths), you’ll get the differences between the prices:
Final note when comparing DataFramesWhen you compare two DataFrames, you must ensure that the number of records in the first DataFrame matches with the number of records in the second DataFrame. In our example, each of the two DataFrames had 4 records, with 4 products and 4 prices. If, for example, one of the DataFrames had 5 products, while the other DataFrame had 4 products, and you tried to run the comparison, you would then get the following error: ValueError: Can only compare identically-labeled Series objects How do I compare two datasets?When you compare two or more data sets, focus on four features:. Center. Graphically, the center of a distribution is the point where about half of the observations are on either side.. Spread. The spread of a distribution refers to the variability of the data. ... . Shape. ... . Unusual features.. How do I compare two values in a list Python?Using list.
sort() method sorts the two lists and the == operator compares the two lists item by item which means they have equal data items at equal positions. This checks if the list contains equal data item values but it does not take into account the order of elements in the list.
How do you compare results in Python?== and is are two ways to compare objects in Python. == compares 2 objects for equality, and is compares 2 objects for identity.. Example 1 compares 2 strings. ... . Example 2 creates list a and b which eventually refer to the same object.. |