Search and Replace multiple words or characters with Python
A most frequently question is how to replace all occurrences of a word or a character inside a string or a file.
If you just want to replace a simple character or word, then all you have to do is to use the replace() method that Python provides for that purpose. The python replace() method takes 2 arguments and a third optional one. It’s definition is:
replace(old, new[, count])
And as the documents tell, it returns a copy of the string with all occurrences of substring ‘old’ replaced by ‘new‘. If the optional argument ‘count‘ is given, only the first count occurrences are replaced. If you leave that optional argument empty, then all occurrences will be replaced.
An example of this method might be:
my_text = 'Hello everyone. Say "Hello" to me!'
my_text = my_text.replace('Hello', 'Goodbye')
print my_text
#prints 'Goodbye everyone. Say "Goodbye" to me!'
or :
my_text = 'Hello everyone. Say "Hello" to me!'
print my_text.replace('Hello', 'Goodbye')
or:
print 'Hello everyone. Say "Hello" to me!'.replace('Hello', 'Goodbye')
It’s pretty the same. Use what you think is best for the situation or your style.
It’s a very simple and straigth forward method that you will not find difficult to include it in your code for simple replacements.
But what about if we want to replace multiple characters or words inside a string or file?
My implementation is a simple one. With it you can replace all occurences of a single character or word as the python replace() method mentioned above, but also multiple characters and words inside a string or a whole file.
Let’s see it:
def replace_all(text, dic): for i, j in dic.iteritems(): text = text.replace(i, j) return text
Our method, replace_all(), takes 2 arguments. The first one, text, is the string or file (it’s text) that the replacement will take place. The second one, dic, is a dictionary with our word or character(s) to be replaced as the key, and the replacement word or character(s) as the value of that key. This dictionary can have just one key:value pair if you want to replace just one word or character, or multiple key:values if you want to replace multiple words or characters at once.
A sample dictionary is like that one:
reps = {'a':'@', 'e':'3', 's':'5'}
With this dictionary we define that we want to replace ‘a’ with ‘@’, ‘e’ with ’3′ and ‘s’ with ’5′. Of course you’ll make your own dictionary with your custom key:values.
So, let’s make a working example and see if it works so far, before we see how our method works.
# define our method
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
# our text the replacement will take place
my_text = 'Hello everybody.'
# our dictionary with our key:values.
# we want to replace 'H' with '|-|'
# 'e' with '3' and 'o' with '0'
reps = {'H':'|-|', 'e':'3', 'o':'0'}
# bind the returned text of the method
# to a variable and print it
txt = replace_all(my_text, reps)
print txt # it prints '|-|3ll0 3v3ryb0dy'
# of course we can print the result
# at once with:
# print replace_all(my_text, reps)
Save it and run it from the console and see what you got.
Pretty simple so far, isn’t it? So let’s get inside our replace_all() method and see how it works.
First we start iterating in our dictionary using the iteritems() method that Python provides for a dictionary:
for i, j in dic.iteritems():
With the iteritems() method you can retrieve the key and corresponding value at the same time, so that’s why we use ‘for i, j’ and not a simple iterator. As we iterate, we bind the current key we are to ‘i’ and it’s corresponding value to ‘j’.
Next is the replacement method. Here we use the replace() method that we mentioned in the beginning.
We simply tell that we want to replace the ‘i’ key with it’s corresponding ‘j’ value in our text and then bind the returned copy of the python replace() method to our text again, so it will always be updated with the replacements that took place so far:
text = text.replace(i, j)
And lastly we return our text so that we can use it.
Here I must warn you about something that you may face using the dictionary with your custom key:values.
If you use this method for simple character or word replace, single or multiple, then it will work as expected.
For example both of the following dictionaries work as expected:
# for search & replace of whole words
dic = {'hello':'goodbye', 'bad':'good', 'yes':'no'}
# for search & replace of characters
dic = {'a':'@', 'e':'3', 'o':'0', '8':'eight'}
We fill our dictionaries with as many words we want to be searched and replaced in the first case, and as many characters we want to be searched and replaced in the second one, and everything works fine.
But what about if in rare occasions we mix up characters and words together and our dictionary looks like the following one?
# assuming that our string the replacement
# will take place is 'hello everybody'
dic = {'hel':'HEL', 'e':'3', 'o':'0'}
It will work, replacements will take place, but not as you expected to. You’d expect it to return something like ‘H3Ll0 3v3ryb0dy‘ or ‘HELl0 3v3ryb0dy‘ I guess, eh?
No. Most possibly it will return ‘h3ll0 3v3ryb0dy‘, at least that’s what I get when I run it on my machine. But why?
This happens because our ‘e‘ key is overlapped with the ‘hel‘ key and because when python iterates through a dictionary, the ordering of the keys and values retrieved from that dictionary cannot be defined. That means that you can’t be sure in which order it searches the keys. And most of the times, definitely not in the order that you define them in your dictionary. The algorithm that Python uses to search through a dictionary is a complex one to discuss it here. That means that our ‘e‘ key may come before our ‘hel‘ key or in other occasions the opposite. In our example ‘e‘ comes first, so first it finds the ‘e‘ in our text and replaces it with ‘3‘. Then it searches for the ‘hel‘ key. But now there is no ‘hel‘ in our text, we modified it with the previous replacement. Now we have ‘h3l‘, so no ‘hel’:'HEL‘ replacement can take place. Understood?
So don’t mix characters along with words in your replacement. In rare occasions that you wanna do it, it’s better to define two dictionaries, one for the words and one for the characters, and then use the replace_all() method two times. So for our previous example, you can do something like this:
text = 'hello everybody'
w_dic = {'hel':'HEL'}
c_dic = {'E':'3', 'e':'3', 'o':'0'}
text = replace_all(text, w_dic)
text = replace_all(text, c_dic)
print text # prints 'H3Ll0 3v3ryb0dy'
And of course we must always remember that the python replace() method is case sensitive, don’t forget that ever. That’s the reason we included both ‘E‘ and ‘e‘ in our dictionary.
You can implement this method in whatever way you want inside your code, just find the one that suits you.
And of course if all you want is to replace just a single character or word, you can use the python replace() method mentioned in the beginning, so you don’t need a dictionary and my method. Just use the method I provided for multiple word or character replacements instead.
And both methods of course can be used to replace occurrences inside a simple string or a file’s text. That is up to you and very easy to implement.
Hope you found that post interesting. For any questions or recomendations or whatever, just leave a comment.
[PS: And once again I must apologize for my bad use of the english language.]
I found your post very helpful. Thank you for taking the time to post it.
-jre
how to replace from a file
tp replace a word from a line just need
a simple code:
L=’my book’
m=L.replace(‘my’,'your’)
print m.
but to replace a word from a file is not easy.
Do any one know?
@mizanur
the solution is simple:
f = open(filename, ‘r’).read()
# f now holds the text of the file you opened
m = f.replace(’my’,’your’)
print m
Read the python docs for file input and output:
http://www.python.org/doc/2.5.2/tut/node9.html (for python 2.5.2)
http://docs.python.org/tutorial/inputoutput.html (for python 2.6)
Found this very, very helpful compared to the rest of the items that comes up on google. Very great!
can we count how many words has been replaced in dictionary method
Hi! This is very useful, but I’m having problems with unicode chars replacement. Can you help me? I’m using python 2.5 and BeautifulSoup for scraping webs. I’m decoding the pages to utf-8. Some of those pages have vocals with acutes: á,é,í, and so on. I want to replace them to the respective html code: á, é, í… I’ve tried to replace them using chr(225), r’á’, u’\u00E1′, and it doesn’t work. The same occurs when I find a bullet (u’\u2022′) that I want to replace to ‘*’. Can someone help me? Thanks
Help for “Defining Python Source Code Encodings” is here: http://www.python.org/peps/pep-0263.html
Have a nice coding.
@ krausyd:
try this:
output = unicode (input, “UTF-8″)
This may work for you. Also, try using soup.prettify()
I’m new to python and to BeautifulSoup so I’m just trying to offer some help.
This does not scale. It is O(n*m) where n is the size of the string and m is the number of elements in the dictionary. You should be able to solve this in O(n).
The for loop will cause the problem in some cases like the one here when you want to change characters ” and ‘&’ with ‘<’, ‘>’ and ‘&’.
The for loop will cause the problem in some cases like the one here when you want to change characters ” and ‘\&’ with ‘\<’ , ‘\>’ and ‘\&’.
This was very helpful.
I have a problem so. I do not want to replace a word when it is embedded in another word. FOr example I want to replace “the” with “The” but not in the word “mother”. I tried \b and /b and it did not work.
Ive tried your method on python 2.6
text = ‘hello everybody’
>>> w_dic = {‘hel’:'HEL’}
>>> c_dic = {‘E’:’3′, ‘e’:’3′, ‘o’:’0′}
>>> text = replace_all(text, w_dic)
>>> text = replace_all(text, c_dic)
>>> print text # prints ‘H3Ll0 3v3ryb0dy’
HELlo 3v3rybody <— it printed that
am i doing something wrong?
Hi,
Thanks a lot for these explanation… Got successful cause of you.
Can donate if you have paypal.
Gontrand
Thank you for your help. I modified it a bit, using just a normal list, and just using “” instead of j. For my laptimer project that i’ve linked to on my website
If I want to replace starting from one word till the next whitespace is found then what i will have to do ??
Here i want to change ‘playingandfun’ till the white space is found
example
“I love playingandfun every where”
now the changed string should be
“I love eatingandrunning everywhere”
Hi Stephen many thanks for the post,
Im learning python and tried to make a small script to replace mltiple chars in a csv file using your post as a base what happens is that only the first key: value in my dicionary is replaced at the end, im pasting my code so you can give a help if possible.
Many thanks
Jorge
***********************************************************
import sys, string, os, time
t1 = time.clock()
dropBx = “C:\\Users\\JO\\Documents\\My Dropbox\0_python\\findandreplace\\”
inFileName = dropBx+”a.csv”
outFileName = dropBx+”c.csv”
#findStr = ["A","B"]
#replaceStr = “#”
# define method replace_all
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i,j)
return text
reps = {‘A’:'#’, “B”:’-'} # dicinoary for the replacements
inFile = open(inFileName, ‘r’)
inFileStr = inFile.read()
inFile.close()
outputStr = replace_all(inFileStr, reps)
outFile = open(outFileName, ‘w’)
outFile.write(outputStr)
outFile.close()
print time.clock()-t1/60-0
OMW, you saved my life.
I have been struggling with python, its my first programming language, and with the help of your program, I was able to do what I needed and to make more sense out of it as well! I just needed to change the input to be readable from a txt file, man I am sooooo happy now.
Great Post.
What would one have to do, for interest sake:
if you have a long document, and change all the text to be printed out as ‘words’ and all the integers as ‘numbers’?
via a for loop? (to put it in pseudocode)
if int: print numbers, else print number
kind regards
Thanks a lot, your post was of much help.
Thank you very much, I was looking around for such a method.
Thx, I helped me very much.
fede from Argentina
I am trying to figure out a way to replace on the second time the word is in the string? Do you have ideas on how to do that?
For example:
Hello world, say “Hello” to me!
would be replaced with the word Goodbye like this:
Hello world, say “Goodbye” to me!
Thanks for posting this! VERY helpful
Thanks for the helpful post. I am teaching myself python and posts like this make it easy to extrapolate to my own problems!!
Just in case, would this also work?
>>> from string import maketrans
>>> print “hello”.translate(maketrans(“elo”, “310″))
h3110
>>>
Many thanks ! worked great in my case !!
Grigoris
[...] to change more than one character then we have to do it differently. A nice solution I found at Gomputor’s blog is by defining a function like [...]
I keep getting errors.
How to replace the words by defining function dormouse that takes two arguments?
poem = “twinkle twinkle little star how I wonder what you are”
dict = [(twinkle,twitter),(star,bat),(how,manner-inwhich),(
wonder,speculate)]
>>> dormouse(poem,dict)
twitter twitter little bat manner-in-which I speculate what you
are
poem = “humpty dumpty sat on a wall humpty dumpty had a great fall”
dict = [(humpty dumpty, the egg),(sat,rested-on-its
behind),(wall,bar),(great,gigantic),(fall,depreciation)]
>>>dormouse(poem,dict)
the egg rested-on-its-behind on a bar the egg had a gigantic
depreciation
I’m trying to test method replace_all() using PyScripter.
However, I repeatedly get an error:
The line which contains
text = text.replace(i, j)
appears marked in red and an error window pops up saying:
AttributeError: ‘_sre.SRE.Match’ object has no attribute ‘replace’
very helpful ! thanks
Question 1: Encoding a Text File (5 marks)
We want to encode text files to keep the data private. Write a program that does the following:
• Read a text file (check for exceptions) and print the contents of the file (as one long string)
• Analyse the encoding instructions
• Apply the encoding instructions to the contents of the file
• Write the contents to a new file
• Open the new file and print its contents
The encoding instructions will be given in the form “ae;ea;s3″.
For this example this would mean to change ‘a’ to ‘e’, change ‘e’ to ‘a’ and change ‘s’ to ‘3’.
Your program needs to also work with any other encoding instructions provided.
The new file must have the old filename with the word “Encoded” added to the basic name.
For example, “message.txt” will become “messageEncoded.txt”.
The first line of your main function must look like this:
def encode(pattern, filename) :
An example output would look like this:
>>> encode(“ae;ea;s3″, “F:\\message.txt”)
This is a test message
Change a to e
Change e to a
Change s to 3
Thi3 i3 e ta3t ma33ega
Hi guys,
Any one here can help me to solve the above problem?
Many Thanks in advance
Old post, but very helpful. Thanks a lot!
How might you write the str.replace without using the replace function?
Just a thanks for this post….