I’ve been scraping the Game of Thrones wiki in preparation for a meetup at Women Who Code next week and while attempting to extract character allegiances I wanted to insert missing line breaks to separate different allegiances.
I initially tried creating a line break like this:
>>> from bs4 import BeautifulSoup >>> tag = BeautifulSoup("<br />", "html.parser") >>> tag <br/>
It looks like it should work but later on in my script I check the ‘name’ attribute to work out whether I’ve got a line break and it doesn’t return the value I expected it to:
>>> tag.name u'[document]'
My script assumes it’s going to return the string ‘br’ so I needed another way of creating the tag. The following does the trick:
>>> from bs4 import Tag >>> tag = Tag(name = "br") >>> tag <br></br>
>>> tag.name 'br'
That’s all for now, back to scraping for me!