Skip to main content

Be a Problem Solver

Tag: Pandas

Replacing Values in a Pandas DataFrame Using a Mask

To update values in a pandas DataFrame, we can use a mask to select rows based on a condition. In the following example, a DataFrame with type and value columns is used. Furthermore, we’d like to replace NaN values in the value column with the previous value, but only for rows where the type column equals a.

OldNew
typevalue
a1
aNaN
b3
bNaN
typevalue
a1
a1
b3
bNaN

To achieve this, we can use the mask(cond, other) method, which replaces values where cond evaluates to True with those specified in other.

How to Correctly Read European Thousands and Decimals from a CSV File

European numbers use a period (.) as the thousands separator and a comma (,) as the decimal point, which is different from the US style. When reading a CSV file with pandas, these differences can cause errors. To fix this, use the thousands and decimal parameters in the read_csv function.

from io import StringIO

test_data = """
col1;col2
A;3.000,12
B;2.000,22
"""

wrong = pd.read_csv(StringIO(test_data), sep=';', decimal=",")
# If the numbers have both separators, make sure to specify both parameters.
corret = pd.read_csv(StringIO(test_data), sep=';', decimal=",", thousands=".")


print(wrong.dtypes)
print(correct.dtypes)