Mark Needham

Thoughts on Software Development

Python: BeautifulSoup – Insert tag

without comments

I’ve been scraping the Game of Thrones wiki in preparation for a meetup at Women Who Code next week and while attempting to extract character allegiances I wanted to insert missing line breaks to separate different allegiances.

I initially tried creating a line break like this:

>>> from bs4 import BeautifulSoup
>>> tag = BeautifulSoup("<br />", "html.parser")
>>> tag
<br/>

It looks like it should work but later on in my script I check the ‘name’ attribute to work out whether I’ve got a line break and it doesn’t return the value I expected it to:

>>> tag.name
u'[document]'

My script assumes it’s going to return the string ‘br’ so I needed another way of creating the tag. The following does the trick:

>>> from bs4 import Tag
>>> tag = Tag(name = "br")
>>> tag
<br></br>
>>> tag.name
'br'

That’s all for now, back to scraping for me!

Be Sociable, Share!

Written by Mark Needham

June 30th, 2016 at 9:28 pm

Posted in Python

Tagged with