Excel to Python: MID Function - A Complete Guide | Mito
Home Icon
divider

Functions

divider

Text

divider

MID

How to Use Excel's MID Function in Pandas

Excel's MID function allows you to extract a substring from a given string based on the start position and the number of characters. This can be incredibly handy for parsing structured text data.

This page shows you how to replicate the functionality of Excel's MID function in Python using pandas.

Mito is an open source library that lets you write Excel formulas in Python. Either write the formula directly in Python or use the MID formula in the Mito Spreadsheet and generate the equivalent Python code automatically.

Mito's MID function works exactly like it does in Excel. That means you don't need worry about managing data types, handling errors, or the edge case differences between Excel and Python formulas.

Install Mito to start using Excel formulas in Python.

# Import the mitosheet Excel functions
from mitosheet.public.v3 import *;

# Use Mito's MID function
# Note: No need to convert the column to a string column before
# using Mito's MID function b/c Mito does so automatically
df['extracted'] = MID(df['column'], 5, 3)
Copy!
Clipboard

To mimic Excel's MID function in Python using pandas, we utilize string slicing. The key is understanding the difference in indexing between Excel and Python. Whereas Excel is 1-indexed, Python is 0-indexed. This means that in Python, the first character in a string is at index 0, the second character is at index 1, and so on.

In Excel, you'd use =MID(A1, 5, 3) to extract three characters starting from the fifth character.

In pandas, you can achieve this by combining the `.str` accessor with slicing. Remember, Python is 0-indexed:

# Extract 3 characters starting from the 5th character
df['extracted'] = df['column'].str[4:7]
Copy!
Clipboard

In Excel, you can use =MID(A1, 1, 3) to extract the first three characters.

In pandas, you can achieve this by omitting the start index in the slice:

# Extract the first three characters
df['extracted'] = df['column'].str[:3]
Copy!
Clipboard

In Excel, you can use =MID(A1, 5, LEN(A1)) to extract all characters starting from the fifth character.

In pandas, you can achieve this by omitting the end index in the slice:

# Extract all characters starting from the 5th character
df['extracted'] = df['column'].str[4:]
Copy!
Clipboard

Let's say you have string in the format of 'FirstName LastName' and you want to extract the first name. In Excel, you can use a combination of the MID function and the FIND function to extract the first name. The formula would look like =MID(A1, 1, FIND(" ", A1)-1). This formula finds the position of the space character, subtracts 1, and uses that as the length argument for the MID function.

In pandas, you can achieve this by using the `.str.split()` method to split the string on the space character. The `split` function returns a list of string elements. In this case, the list will have two elements: the first name and the last name. Then, you can then use the `.str[]` accessor to extract the first element of the resulting list:

# Extract the first name
df['A'].str.split(' ').str[0]
Copy!
Clipboard

Assuming phone numbers are stored in the format (123) 456-7890, in Excel, you might use =MID(A1, 2, 3) to extract the area code.

To extract the area code in pandas, use string slicing in conjunction with the `.str` accessor:

df['area_code'] = df['phone_number'].str[1:4]
Copy!
Clipboard

While the process to replicate the MID function in pandas is straightforward using string slicing, there are some potential pitfalls to be aware of. Here are some common mistakes:

Excel is 1-indexed, meaning the first character in a string is at index 1. Python is 0-indexed, meaning the first character in a string is at index 0. This means that if you want to extract the first three characters of a string, you'd use =MID(A1, 1, 3) in Excel and df['column'].str[0:3] (or df['column'].str[:3]) in pandas.

# Extract the first three characters
df['extracted'] = df['column'].str[0:3]
Copy!
Clipboard

Attempting to use the MID functionality on non-string columns will lead to TypeErrors. Always ensure your column is of the string data type before attempting extraction.

# Convert the column to string type
df['column'] = df['column'].astype(str)

Extract characters
df['extracted'] = df['column'].str[start:start+length]
Copy!
Clipboard

By default, pandas will return NaN values if you attempt to apply string slicing to a NaN or None value. This is slightly different from Excel, which will return a blank cell, not #N/A if you attempt to apply the MID function to a blank cell.

To mimic Excel's behavior, you can use the `fillna()` method to replace NaN values with empty strings.

# Convert NaN values to empty strings
df['column'].fillna('', inplace=True)

Extract characters
df['extracted'] = df['column'].str[start:start+length]
Copy!
Clipboard

The MID function in Excel extracts a substring from the middle of a string .

=MID(text, start_num, num_chars)

MID Excel Syntax

ParameterDescriptionData Type
textThe original string.string
start_numThe starting position of the extraction.number
num_charsNumber of characters to extract.number

Examples

FormulaDescriptionResult
=MID("Hello, World!", 8, 5)Extract 5 characters starting from the 8th character.World

Don't re-invent the wheel. Use Excel formulas in Python.

Install Mito

Don't want to re-implement Excel's functionality in Python?

Automate analysis with Mito