w3schools has many excellent tutorials. Their tutorial on HTML DOM is at http://www.w3schools.com/htmldom/default.asp
These pages apply to specific properties of the Expession Web object model as they pertain to certain aspects of the DOM.
A good starting point is to take open a htm file to play around with. Make sure it does not contain any data that you need.
You can get rid of most of the ActiveDocument by using
ActiveDocument.DocumentHTML = " "
This will leave only one blank space in the ActiveDocument.
You could also use:
ActiveDocument.DocumentHTML = ""
If you use the quotation marks without a space in between them, the ActiveDocument will be left with:
<html> <body> </body> </html>
It doesn't really matter whether you set the DocumentHTML to " " or to "". With both methods, if you use the statement
MsgBox ActiveDocument.all.Length
The MessageBox will pop up and display "3".
This means that the ActiveDocument still has three elements, even though you have erased the entire DocumentHTML.
To see what these three elements are, you can use:
MsgBox ActiveDocument.all.Item(0).tagName MsgBox ActiveDocument.all.Item(1).tagName MsgBox ActiveDocument.all.Item(2).tagName
The results will show that
item(0) tagName is "html"
item(1)
tagName is "head"
item(2) tagName is "body"
You could also use:
MsgBox ActiveDocument.all(0).tagName MsgBox ActiveDocument.all(1).tagName MsgBox ActiveDocument.all(2).tagName
The results will be the same for either
ActiveDocument.all.Item(0)
or for
ActiveDocument.all(0).
I believe that showing the item property is the recommended method, but I will sometimes switch back and forth.
So even if you delete all of the text in your document, it will still have 3 html elements.
This is because the start tag and end tag are optional for the html, head and body elements.
Deleting the text just removes the start tag and the end tag, but those three elements will still remain on an Expression Web ActiveDocument.
The first thing I usually put on an empty page is the Doctype. I use:
Dim strDocType As String strDocType = "<!DOCTYPE HTML PUBLIC " & """" & _ "-//W3C//DTD HTML 4.01 Transitional//EN" & _ """" & " " & """" & _ "http://www.w3.org/TR/html4/loose.dtd" & """" & ">" ActiveDocument.DocumentHTML = strDocType
The ActiveDocument will now look like:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
The above output will probably be on a single continuous line. I wrapped it here so that it would display in its entirety.
The next thing I usually do is to add the Start Tag and the
To show the OuterHTML of these three elements you can use:
Sub Add_HTML_Head_Body_OuterHTML() Dim strHTML As String Dim strHead As String Dim strBody As String Dim objHTMLElement As IHTMLElement Dim objHeadElement As IHTMLElement Dim objBodyement As IHTMLElement Set objHTMLElement = ActiveDocument.all.Item(0) Set objHeadHead = ActiveDocument.all.Item(1) Set objBodyBody = ActiveDocument.all.Item(2) strHTML = vbCrLf & vbCrLf & "<html>" & _ vbCrLf & _ "</html>" strHead = "<head>" & _ vbCrLf & _ "</head>" & _ vbCrLf strBody = "<body>" & _ vbCrLf & _ "</body>" & _ vbCrLf ActiveDocument.all.Item(0).outerHTML = strHTML ActiveDocument.all.Item(1).outerHTML = strHead ActiveDocument.all.Item(2).outerHTML = strBody End Sub
The results will be:
<html> <head> </head> <body> </body> </html>
To start examining the hierarchy, you can use:
MsgBox ActiveDocument.all(0).Children.Length
This will show that element (0) has two children.
You can see the children's names with.
MsgBox ActiveDocument.all(0).Children(0).tagName MsgBox ActiveDocument.all(0).Children(1).tagName
The children's names will be "head" and "body"
To see the parent name of these children you could use:
MsgBox ActiveDocument. _ all(0). _ Children(0). _ parentElement.tagName
This will show that the parentElement tagName of the first child is "html". The same would hold true for the other child "body"
So at this point the ActiveDocument has one root element called "html".
"html" has two children, "head" and "body"
You should also be able to see the 2 children of "html" by using:
MsgBox ActiveDocument.all(0).innerHTML
The results will incorrectly show as:
<html> <head> </head> <body> </body> </html>
I believe this is another bug in Expression Web..
The correct output should be just the HTML between the html start tag and the html end tag which would be:
<head> </head> <body> </body>
I'm not aware of any other elements that show the incorrect innerHTML.