Files
edx-platform/common/lib/xmodule/stringify.py
Victor Shnayder ed35cefa29 Fix html file handling.
* html files are now stored as follows:

If the html file is valid xml, store as html/stuff.xml

If it's not, store as html/stuff.xml, which contains
<html meta1="..."  filename="stuff.html">,
and html/stuff.html, which actually contains the contents.
Warn if the contents are not parseable with lxml's html parser,
but don't error.

* for parseable html, strip out the html tag when storing, so that it isn't
  rendered into the middle of a page

* lots of backcompat to deal with paths.  Can go away soon.

* fix output ordering in clean_xml
2012-08-01 11:40:12 -04:00

21 lines
708 B
Python

from itertools import chain
from lxml import etree
def stringify_children(node):
'''
Return all contents of an xml tree, without the outside tags.
e.g. if node is parse of
"<html a="b" foo="bar">Hi <div>there <span>Bruce</span><b>!</b></div><html>"
should return
"Hi <div>there <span>Bruce</span><b>!</b></div>"
fixed from
http://stackoverflow.com/questions/4624062/get-all-text-inside-a-tag-in-lxml
'''
parts = ([node.text] +
list(chain(*([etree.tostring(c), c.tail]
for c in node.getchildren())
)))
# filter removes possible Nones in texts and tails
return ''.join(filter(None, parts))