Search and Replace multiple words or characters with Python

A most frequently question is how to replace all occurrences of a word or a character inside a string or a file.
If you just want to replace a simple character or word, then all you have to do is to use the replace() method that Python provides for that purpose. The python replace() method takes 2 arguments and a third optional one. It’s definition is:

replace(old, new[, count])

And as the documents tell, it returns a copy of the string with all occurrences of substring ‘old’ replaced by ‘new‘. If the optional argument ‘count‘ is given, only the first count occurrences are replaced. If you leave that optional argument empty, then all occurrences will be replaced.
An example of this method might be:

my_text = 'Hello everyone. Say "Hello" to me!'
my_text = my_text.replace('Hello', 'Goodbye')
print my_text
#prints 'Goodbye everyone. Say "Goodbye" to me!'

or :

my_text = 'Hello everyone. Say "Hello" to me!'
print my_text.replace('Hello', 'Goodbye')

or:

print 'Hello everyone. Say "Hello" to me!'.replace('Hello', 'Goodbye')

It’s pretty the same. Use what you think is best for the situation or your style.
It’s a very simple and straigth forward method that you will not find difficult to include it in your code for simple replacements.

But what about if we want to replace multiple characters or words inside a string or file?
My implementation is a simple one. With it you can replace all occurences of a single character or word as the python replace() method mentioned above, but also multiple characters and words inside a string or a whole file.
Let’s see it:

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

Our method, replace_all(), takes 2 arguments. The first one, text, is the string or file (it’s text) that the replacement will take place. The second one, dic, is a dictionary with our word or character(s) to be replaced as the key, and the replacement word or character(s) as the value of that key. This dictionary can have just one key:value pair if you want to replace just one word or character, or multiple key:values if you want to replace multiple words or characters at once.
A sample dictionary is like that one:

reps = {'a':'@', 'e':'3', 's':'5'}

With this dictionary we define that we want to replace ‘a’ with ‘@’, ‘e’ with ‘3’ and ‘s’ with ‘5’. Of course you’ll make your own dictionary with your custom key:values.

So, let’s make a working example and see if it works so far, before we see how our method works.

# define our method
def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

# our text the replacement will take place
my_text = 'Hello everybody.'

# our dictionary with our key:values.
# we want to replace 'H' with '|-|'
# 'e' with '3' and 'o' with '0'
reps = {'H':'|-|', 'e':'3', 'o':'0'}

# bind the returned text of the method
# to a variable and print it
txt = replace_all(my_text, reps)
print txt    # it prints '|-|3ll0 3v3ryb0dy'

# of course we can print the result
# at once with:
# print replace_all(my_text, reps)

Save it and run it from the console and see what you got.
Pretty simple so far, isn’t it? So let’s get inside our replace_all() method and see how it works.
First we start iterating in our dictionary using the iteritems() method that Python provides for a dictionary:

for i, j in dic.iteritems():

With the iteritems() method you can retrieve the key and corresponding value at the same time, so that’s why we use ‘for i, j’ and not a simple iterator. As we iterate, we bind the current key we are to ‘i’ and it’s corresponding value to ‘j’.

Next is the replacement method. Here we use the replace() method that we mentioned in the beginning.
We simply tell that we want to replace the ‘i’ key with it’s corresponding ‘j’ value in our text and then bind the returned copy of the python replace() method to our text again, so it will always be updated with the replacements that took place so far:

text = text.replace(i, j)

And lastly we return our text so that we can use it.

Here I must warn you about something that you may face using the dictionary with your custom key:values.
If you use this method for simple character or word replace, single or multiple, then it will work as expected.
For example both of the following dictionaries work as expected:

# for search & replace of whole words
dic = {'hello':'goodbye', 'bad':'good', 'yes':'no'}
# for search & replace of characters
dic = {'a':'@', 'e':'3', 'o':'0', '8':'eight'}

We fill our dictionaries with as many words we want to be searched and replaced in the first case, and as many characters we want to be searched and replaced in the second one, and everything works fine.

But what about if in rare occasions we mix up characters and words together and our dictionary looks like the following one?

# assuming that our string the replacement
# will take place is 'hello everybody'
dic = {'hel':'HEL', 'e':'3', 'o':'0'}

It will work, replacements will take place, but not as you expected to. You’d expect it to return something like ‘H3Ll0 3v3ryb0dy‘ or ‘HELl0 3v3ryb0dy‘ I guess, eh?
No. Most possibly it will return ‘h3ll0 3v3ryb0dy‘, at least that’s what I get when I run it on my machine. But why?
This happens because our ‘e‘ key is overlapped with the ‘hel‘ key and because when python iterates through a dictionary, the ordering of the keys and values retrieved from that dictionary cannot be defined. That means that you can’t be sure in which order it searches the keys. And most of the times, definitely not in the order that you define them in your dictionary. The algorithm that Python uses to search through a dictionary is a complex one to discuss it here. That means that our ‘e‘ key may come before our ‘hel‘ key or in other occasions the opposite. In our example ‘e‘ comes first, so first it finds the ‘e‘ in our text and replaces it with ‘3‘. Then it searches for the ‘hel‘ key. But now there is no ‘hel‘ in our text, we modified it with the previous replacement. Now we have ‘h3l‘, so no ‘hel':’HEL‘ replacement can take place. Understood?
So don’t mix characters along with words in your replacement. In rare occasions that you wanna do it, it’s better to define two dictionaries, one for the words and one for the characters, and then use the replace_all() method two times. So for our previous example, you can do something like this:

text = 'hello everybody'
w_dic = {'hel':'HEL'}
c_dic = {'E':'3', 'e':'3', 'o':'0'}
text = replace_all(text, w_dic)
text = replace_all(text, c_dic)
print text  # prints 'H3Ll0 3v3ryb0dy'

And of course we must always remember that the python replace() method is case sensitive, don’t forget that ever. That’s the reason we included both ‘E‘ and ‘e‘ in our dictionary.

You can implement this method in whatever way you want inside your code, just find the one that suits you.
And of course if all you want is to replace just a single character or word, you can use the python replace() method mentioned in the beginning, so you don’t need a dictionary and my method. Just use the method I provided for multiple word or character replacements instead.
And both methods of course can be used to replace occurrences inside a simple string or a file’s text. That is up to you and very easy to implement.

Hope you found that post interesting. For any questions or recomendations or whatever, just leave a comment.

[PS: And once again I must apologize for my bad use of the english language.]

About these ads

49 comments so far

  1. John Erck on

    I found your post very helpful. Thank you for taking the time to post it.

    -jre

  2. mizanur on

    how to replace from a file
    tp replace a word from a line just need
    a simple code:
    L=’my book’
    m=L.replace(‘my’,’your’)
    print m.
    but to replace a word from a file is not easy.
    Do any one know?

    • Steve Byrne on

      You could read the file into a variable, and then read the variable out back to the file (replacing the file) and use the same .replace(‘bad word’, ‘good word’)

  3. gomputor on

    @mizanur
    the solution is simple:

    f = open(filename, ‘r’).read()
    # f now holds the text of the file you opened

    m = f.replace(’my’,’your’)
    print m

    Read the python docs for file input and output:
    http://www.python.org/doc/2.5.2/tut/node9.html (for python 2.5.2)

    http://docs.python.org/tutorial/inputoutput.html (for python 2.6)

  4. Michael on

    Found this very, very helpful compared to the rest of the items that comes up on google. Very great!

  5. cron on

    can we count how many words has been replaced in dictionary method

  6. krausyd on

    Hi! This is very useful, but I’m having problems with unicode chars replacement. Can you help me? I’m using python 2.5 and BeautifulSoup for scraping webs. I’m decoding the pages to utf-8. Some of those pages have vocals with acutes: á,é,í, and so on. I want to replace them to the respective html code: á, é, í… I’ve tried to replace them using chr(225), r’á’, u’\u00E1′, and it doesn’t work. The same occurs when I find a bullet (u’\u2022′) that I want to replace to ‘*’. Can someone help me? Thanks

  7. fish oil pills on

    @ krausyd:
    try this:

    output = unicode (input, “UTF-8″)

    This may work for you. Also, try using soup.prettify()

    I’m new to python and to BeautifulSoup so I’m just trying to offer some help.

  8. Trevor on

    This does not scale. It is O(n*m) where n is the size of the string and m is the number of elements in the dictionary. You should be able to solve this in O(n).

  9. umsert on

    The for loop will cause the problem in some cases like the one here when you want to change characters ” and ‘&’ with ‘<‘, ‘>’ and ‘&’.

  10. umsert on

    The for loop will cause the problem in some cases like the one here when you want to change characters ” and ‘\&’ with ‘\<‘ , ‘\>’ and ‘\&’.

  11. ShT on

    This was very helpful.

    I have a problem so. I do not want to replace a word when it is embedded in another word. FOr example I want to replace “the” with “The” but not in the word “mother”. I tried \b and /b and it did not work.

  12. wayko on

    Ive tried your method on python 2.6
    text = ‘hello everybody’
    >>> w_dic = {‘hel':’HEL’}
    >>> c_dic = {‘E':’3′, ‘e':’3′, ‘o':’0′}
    >>> text = replace_all(text, w_dic)
    >>> text = replace_all(text, c_dic)
    >>> print text # prints ‘H3Ll0 3v3ryb0dy’
    HELlo 3v3rybody <— it printed that
    am i doing something wrong?

  13. Gontrand on

    Hi,

    Thanks a lot for these explanation… Got successful cause of you.

    Can donate if you have paypal.

    Gontrand

  14. Fake on

    Thank you for your help. I modified it a bit, using just a normal list, and just using “” instead of j. For my laptimer project that i’ve linked to on my website

  15. Jaideep on

    If I want to replace starting from one word till the next whitespace is found then what i will have to do ??

    Here i want to change ‘playingandfun’ till the white space is found
    example
    “I love playingandfun every where”

    now the changed string should be
    “I love eatingandrunning everywhere”

  16. Jorge on

    Hi Stephen many thanks for the post,
    Im learning python and tried to make a small script to replace mltiple chars in a csv file using your post as a base what happens is that only the first key: value in my dicionary is replaced at the end, im pasting my code so you can give a help if possible.
    Many thanks
    Jorge

    ***********************************************************

    import sys, string, os, time

    t1 = time.clock()
    dropBx = “C:\\Users\\JO\\Documents\\My Dropbox\0_python\\findandreplace\\”
    inFileName = dropBx+”a.csv”
    outFileName = dropBx+”c.csv”

    #findStr = ["A","B"]
    #replaceStr = “#”

    # define method replace_all
    def replace_all(text, dic):
    for i, j in dic.iteritems():
    text = text.replace(i,j)
    return text

    reps = {‘A':’#’, “B”:’-‘} # dicinoary for the replacements

    inFile = open(inFileName, ‘r’)
    inFileStr = inFile.read()
    inFile.close()

    outputStr = replace_all(inFileStr, reps)

    outFile = open(outFileName, ‘w’)
    outFile.write(outputStr)
    outFile.close()

    print time.clock()-t1/60-0

  17. magnus on

    OMW, you saved my life.

    I have been struggling with python, its my first programming language, and with the help of your program, I was able to do what I needed and to make more sense out of it as well! I just needed to change the input to be readable from a txt file, man I am sooooo happy now.

    Great Post.

  18. magnus on

    What would one have to do, for interest sake:
    if you have a long document, and change all the text to be printed out as ‘words’ and all the integers as ‘numbers’?
    via a for loop? (to put it in pseudocode)
    if int: print numbers, else print number

    kind regards

  19. biomechanoid on

    Thanks a lot, your post was of much help.

  20. Husen on

    Thank you very much, I was looking around for such a method.

  21. Pandolfi on

    Thx, I helped me very much.

    fede from Argentina

  22. John on

    I am trying to figure out a way to replace on the second time the word is in the string? Do you have ideas on how to do that?

    For example:

    Hello world, say “Hello” to me!

    would be replaced with the word Goodbye like this:

    Hello world, say “Goodbye” to me!

  23. Jax on

    Thanks for posting this! VERY helpful :)

  24. Heath Blackmon on

    Thanks for the helpful post. I am teaching myself python and posts like this make it easy to extrapolate to my own problems!!

  25. i on

    Just in case, would this also work?
    >>> from string import maketrans
    >>> print “hello”.translate(maketrans(“elo”, “310”))
    h3110
    >>>

  26. Grigoris M on

    Many thanks ! worked great in my case !!

    Grigoris

  27. [...] to change more than one character then we have to do it differently. A nice solution I found at Gomputor’s blog is by defining a function like [...]

  28. lizzy on

    I keep getting errors.
    How to replace the words by defining function dormouse that takes two arguments?

    poem = “twinkle twinkle little star how I wonder what you are”
    dict = [(twinkle,twitter),(star,bat),(how,manner-inwhich),(
    wonder,speculate)]

    >>> dormouse(poem,dict)
    twitter twitter little bat manner-in-which I speculate what you
    are

    poem = “humpty dumpty sat on a wall humpty dumpty had a great fall”
    dict = [(humpty dumpty, the egg),(sat,rested-on-its
    behind),(wall,bar),(great,gigantic),(fall,depreciation)]

    >>>dormouse(poem,dict)
    the egg rested-on-its-behind on a bar the egg had a gigantic
    depreciation

  29. Paulo Braga on

    I’m trying to test method replace_all() using PyScripter.
    However, I repeatedly get an error:
    The line which contains
    text = text.replace(i, j)
    appears marked in red and an error window pops up saying:
    AttributeError: ‘_sre.SRE.Match’ object has no attribute ‘replace’

  30. sahar on

    very helpful ! thanks

  31. Cyrus on

    Question 1: Encoding a Text File (5 marks)
    We want to encode text files to keep the data private. Write a program that does the following:
    • Read a text file (check for exceptions) and print the contents of the file (as one long string)
    • Analyse the encoding instructions
    • Apply the encoding instructions to the contents of the file
    • Write the contents to a new file
    • Open the new file and print its contents

    The encoding instructions will be given in the form “ae;ea;s3″.
    For this example this would mean to change ‘a’ to ‘e’, change ‘e’ to ‘a’ and change ‘s’ to ‘3’.
    Your program needs to also work with any other encoding instructions provided.

    The new file must have the old filename with the word “Encoded” added to the basic name.
    For example, “message.txt” will become “messageEncoded.txt”.

    The first line of your main function must look like this:
    def encode(pattern, filename) :

    An example output would look like this:
    >>> encode(“ae;ea;s3″, “F:\\message.txt”)
    This is a test message
    Change a to e
    Change e to a
    Change s to 3
    Thi3 i3 e ta3t ma33ega

  32. Cyrus on

    Hi guys,

    Any one here can help me to solve the above problem?
    Many Thanks in advance

  33. nanders on

    Old post, but very helpful. Thanks a lot!

  34. Nathaniel Gier on

    How might you write the str.replace without using the replace function?

  35. Ben on

    Just a thanks for this post….

  36. wordpress developer on

    Print Friendly & PDF is a great plugin for allowing your readers to generates clean printouts and pdfs of your posts.
    Very first you should find a great template that you like.

    It’s flexible, user-friendly and packed with awesome features.

  37. Loben Maishen on

    Hello,

    I found this post helpful to a certain extent, but I tried the methods above and it only finds whole words. How can I change that?

    Thanks,
    Loben

  38. Incredibly helpful post. Thank you very much

  39. shahana on

    thanks alot

  40. Chizuru on

    A huge help. Thanks for explaining the dict iteration process.

  41. Lian Tombing on

    Really useful. thanks a lot for the code.

  42. Sofrodca on

    Hello
    I’m just a beginner in Python and have the following trouble: I must take all the data from a database column and to replace all commas in there by decimal points. I’m trying to adapt your code but it is not that clear for me how to call the database file or the table column instead of the string
    my_text = ‘Hello everybody.’

    I did something, but it does not work. Please help me out.

    bd=”/home/user/Bureau/data/file.sqlite”
    cn=sqlite3.connect(bd) # Connecting to database

    cur= cn.cursor() # Creation of a cursor
    cur.execute(“SELECT column_name FROM table_name”)
    mydata=list(cur)

    and then I did

    def replace_all(text, dic):
    for i, j in dic.iteritems():
    text = text.replace(i, j)
    return text

    my_text = mydata

    reps = {‘,’ : ‘.’}

    txt = replace_all(my_text, reps)
    print txt

  43. c4ifford on

    So I got this working, which is great because there isn’t much on the web for accomplishing this. However, I need to see if I can use the re (regex) module and do a re.sub instead of a replace, not sure how exactly that would work. So instead of doing text = text.replace(x, y) doing text = re.sub(x, y) which means the dictionary list reps = {‘item.*’ : ‘item.value’} or something like that, not sure how it’s really going to work, I’m just starting to play with this after I got the first example working

  44. c4ifford on

    def replace_all(text, dic):
    for x, y in dic.iteritems():
    #text = text.replace(x, y)
    text = re.sub(x + “=*”, x + “=” + y, text)
    return text

    is what I came up with.

  45. Ulf Almroth on

    I have tried to use the function in python 3.3. it required a few changes. Now it works on strings pasted into a text string variable but not not on text read into a string from file. When read from file only one entry from the dictionary is replaced. The code looks like this
    def processTextFile(indatafil,utdatafil):
    indatafil=input(“Indatafil? “)
    utdatafil=input(“Utdatafil? “)
    korrektioner={‘Š':’ä’,’š':’ö’,’Œ':’å’} #uppslagsbok för teckenbyten
    print(korrektioner)

    fi=open(indatafil,’r’)
    fu=open(utdatafil,’w’,encoding=”utf-8″)
    instring=fi.read() #läs in filen som sträng
    print(instring)
    orden=instring.split() #split tar bort radbrytningar m.m.
    print(orden)

    texten = ‘ ‘.join(orden) #join ger en sträng med mellanslag
    print(texten)
    def byt_text(text, dic): #funktionen gör uppslag i dictionary
    for i, j in dict.items(dic): #och byter tecken genom att anropa replace
    text = text.replace(i, j)
    return text
    nytext=byt_text(texten,korrektioner)
    print(nytext)
    fu.write(nytext)
    fu.close()

    processTextFile(‘in’,’ut’)

  46. tina on

    hello! i just want to ask something. If you have a clientlist.csv and you make it show like this:

    Andrew,Jen
    Stephan,Anna
    Jason,Kennedy
    Kiki,Stewart

    ….

    and you want to be shown like this:

    Andrew,Jen
    Jason,Kennedy
    Kiki,Stewart
    Stephan,Anna
    ….

    how you will do it?

  47. Ryan Stutzman on

    Thanks for the info. I’m trying to find (or write if one can’t be found) a python script for Notepad++ that deletes all instances of a selected string when I click CTRL+DEL. So basically Copy–>Find->Replace All (with blank text). Any suggestions?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: