.SHARP{C#DERS}

.net development, server management, and related topics

HTML sanitization for ASP.Net

This is an old post from my now defunct personal blog


I spent a great deal of time trying to find a reasonable way to clean HTML. I wanted to remove script tags, broken HTML, and etc but still allow rich text editing. Most of the sanitization routines are just too strict. My input is from a ckeditor so the users have a fair number of formatting options that I want to retain.

This came from a bunch of sources and then was converted to vb.net and tweaked a little by me. 

To use it you would call 

Sanitize("your messy HTML")

This will strip tags not in the whitelist and balance any opened but not closed tags. 

It requires the HTML Agility Pack - http://nuget.org/packages/HtmlAgilityPack

 

Public NotInheritable Class HtmlUtility

    ''' <summary>
    ''' Removes ALL html from a strung
    ''' </summary>
    ''' <param name="source"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public Shared Function StripHTML(source As String)
        If source = String.Empty Then Return String.Empty
        Dim doc = New HtmlAgilityPack.HtmlDocument()
        doc.LoadHtml(source)
        Return doc.DocumentNode.InnerText
    End Function

    ''' <summary>
    ''' Takes raw HTML input and cleans against a whitelist
    ''' </summary>
    ''' <param name="source">Html source</param>
    ''' <returns>Clean output</returns>
    Public Shared Function SanitizeHtml(source As String) As String
        If source = String.Empty Then Return String.Empty
        source = Sanitize(source)
        source = BalanceTags(source)
        Return source
    End Function

    Private Shared _namedtags As New Regex("</?(?<tagname>\w+)[^>]*(\s|$|>)", RegexOptions.Singleline Or RegexOptions.ExplicitCapture Or RegexOptions.Compiled)

    ''' <summary> 
    ''' http://refactormycode.com/codes/360-balance-html-tags
    ''' attempt to balance HTML tags in the html string 
    ''' by removing any unmatched opening or closing tags 
    ''' IMPORTANT: we *assume* HTML has *already* been  
    ''' sanitized and is safe/sane before balancing! 
    '''  
    ''' CODESNIPPET: A8591DBA-D1D3-11DE-947C-BA5556D89593 
    ''' </summary> 
    Public Shared Function BalanceTags(html As String) As String
        If [String].IsNullOrEmpty(html) Then
            Return html
        End If

        ' convert everything to lower case; this makes 
        ' our case insensitive comparisons easier 
        Dim tags As MatchCollection = _namedtags.Matches(html.ToLowerInvariant())

        ' no HTML tags present? nothing to do; exit now 
        Dim tagcount As Integer = tags.Count
        If tagcount = 0 Then
            Return html
        End If

        Dim tagname As String
        Dim tag As String
        Const ignoredtags As String = "<p><img><br><li><hr><input>"
        Dim match As Integer
        Dim tagpaired = New Boolean(tagcount - 1) {}
        Dim tagremove = New Boolean(tagcount - 1) {}

        ' loop through matched tags in forward order 
        For ctag As Integer = 0 To tagcount - 1
            tagname = tags(ctag).Groups("tagname").Value

            ' skip any already paired tags 
            ' and skip tags in our ignore list; assume they're self-closed 
            If tagpaired(ctag) OrElse ignoredtags.Contains("<" & tagname & ">") Then
                Continue For
            End If

            tag = tags(ctag).Value
            match = -1

            If tag.StartsWith("</") Then
                ' this is a closing tag 
                ' search backwards (previous tags), look for opening tags 
                For ptag As Integer = ctag - 1 To 0 Step -1
                    Dim prevtag As String = tags(ptag).Value
                    If Not tagpaired(ptag) AndAlso prevtag.Equals("<" & tagname, StringComparison.InvariantCulture) Then
                        ' minor optimization; we do a simple possibly incorrect match above 
                        ' the start tag must be <tag> or <tag{space} to match 
                        If prevtag.StartsWith("<" & tagname & ">") OrElse prevtag.StartsWith("<" & tagname & " ") Then
                            match = ptag
                            Exit For
                        End If
                    End If
                Next
            Else
                ' this is an opening tag 
                ' search forwards (next tags), look for closing tags 
                For ntag As Integer = ctag + 1 To tagcount - 1
                    If Not tagpaired(ntag) AndAlso tags(ntag).Value.Equals("</" & tagname & ">", StringComparison.InvariantCulture) Then
                        match = ntag
                        Exit For
                    End If
                Next
            End If

            ' we tried, regardless, if we got this far 
            tagpaired(ctag) = True
            If match = -1 Then
                tagremove(ctag) = True
            Else
                ' mark for removal 
                tagpaired(match) = True
                ' mark paired 
            End If
        Next

        ' loop through tags again, this time in reverse order 
        ' so we can safely delete all orphaned tags from the string 
        For ctag As Integer = tagcount - 1 To 0 Step -1
            If tagremove(ctag) Then
                html = html.Remove(tags(ctag).Index, tags(ctag).Length)
                System.Diagnostics.Debug.WriteLine("unbalanced tag removed: " & tags(ctag).ToString)
            End If
        Next

        Return html
    End Function

    Private Shared ReadOnly Whitelist As New Dictionary(Of String, String())() From { _
    {"p", New String() {"style", "class", "align"}}, _
    {"head", New String() {"style", "class", "align"}}, _
    {"body", New String() {"style", "class", "align"}}, _
    {"pre", New String() {"style", "class", "align"}}, _
    {"div", New String() {"style", "class", "align"}}, _
    {"span", New String() {"style", "class"}}, _
    {"br", New String() {"style", "class"}}, _
    {"hr", New String() {"style", "class"}}, _
    {"label", New String() {"style", "class"}}, _
    {"h1", New String() {"style", "class"}}, _
    {"h2", New String() {"style", "class"}}, _
    {"h3", New String() {"style", "class"}}, _
    {"h4", New String() {"style", "class"}}, _
    {"h5", New String() {"style", "class"}}, _
    {"h6", New String() {"style", "class"}}, _
    {"font", New String() {"style", "class", "color", "face", "size"}}, _
    {"strong", New String() {"style", "class"}}, _
    {"b", New String() {"style", "class"}}, _
    {"em", New String() {"style", "class"}}, _
    {"i", New String() {"style", "class"}}, _
    {"u", New String() {"style", "class"}}, _
    {"strike", New String() {"style", "class"}}, _
    {"ol", New String() {"style", "class"}}, _
    {"ul", New String() {"style", "class"}}, _
    {"li", New String() {"style", "class"}}, _
    {"blockquote", New String() {"style", "class"}}, _
    {"code", New String() {"style", "class"}}, _
    {"a", New String() {"style", "class", "href", "title", "target", "name"}}, _
    {"img", New String() {"style", "class", "src", "height", "width", "alt", "title", "hspace", "vspace", "border"}}, _
    {"table", New String() {"style", "class", "width", "cellpadding", "cellspacing", "align", "border"}}, _
    {"thead", New String() {"style", "class"}}, _
    {"tbody", New String() {"style", "class"}}, _
    {"tfoot", New String() {"style", "class"}}, _
    {"th", New String() {"style", "class", "scope"}}, _
    {"tr", New String() {"style", "class"}}, _
    {"td", New String() {"style", "class", "colspan"}}, _
    {"q", New String() {"style", "class", "cite"}}, _
    {"cite", New String() {"style", "class"}}, _
    {"abbr", New String() {"style", "class"}}, _
    {"acronym", New String() {"style", "class"}}, _
    {"del", New String() {"style", "class"}}, _
    {"ins", New String() {"style", "class"}}, _
    {"form", New String() {"style", "class", "method", "name", "action"}}, _
    {"iframe", New String() {"style", "class", "frameborder", "height", "width", "src", "allowfullscreen"}}, _
    {"input", New String() {"name", "type", "value", "class"}} _
    }


    ''' <summary>
    ''' Strip tags not in whitelist
    ''' http://stackoverflow.com/questions/3107514/html-agility-pack-strip-tags-not-in-whitelist
    ''' </summary>
    ''' <param name="input"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public Shared Function Sanitize(input As String) As String
        Dim htmlDocument = New HtmlDocument()

        htmlDocument.LoadHtml(input)
        SanitizeNode(htmlDocument.DocumentNode)

        Return htmlDocument.DocumentNode.WriteTo().Trim()
    End Function

    Private Shared Sub SanitizeChildren(parentNode As HtmlNode)
        For i As Integer = parentNode.ChildNodes.Count - 1 To 0 Step -1
            SanitizeNode(parentNode.ChildNodes(i))
        Next
    End Sub

    Private Shared Sub SanitizeNode(node As HtmlNode)
        If node.NodeType = HtmlNodeType.Element Then
            If Not Whitelist.ContainsKey(node.Name) Then
                node.ParentNode.RemoveChild(node)
                Return
            End If

            If node.HasAttributes Then
                For i As Integer = node.Attributes.Count - 1 To 0 Step -1
                    Dim currentAttribute As HtmlAttribute = node.Attributes(i)
                    Dim allowedAttributes As String() = Whitelist(node.Name)
                    If Not allowedAttributes.Contains(currentAttribute.Name) Then
                        node.Attributes.Remove(currentAttribute)
                    End If
                    If currentAttribute.Value.Contains("javascript") Then
                        node.Attributes.Remove(currentAttribute)
                    End If
                Next
            End If
        End If

        If node.HasChildNodes Then
            SanitizeChildren(node)
        End If
    End Sub
End Class

Without 3 hours him be expedient proposition that 4 pills in re Misoprostol subjacent the speech act similarly all for a fourth someday. Plurative referring to the medicines forfeit on good terms linctus abortion may inaugurate threatening start defects if the greatness continues. This setting unfrequently occurs. Inward-bound Farmacias Similares, misoprostol is sold underwater the fume favor Cyrux. Now are neat pertinent to the fundamentally corporate questions we experience women importune at close quarters the abortion pharmaceutical. Yourselves carton follow Coin B Imperilment Birth control at your tank isolation. Quick clinics gambit nod. Rapport Mexico, how is Misoprostol sold? The page regarding this webpage are considering informational purposes comparatively.

  1. search pills
  2. how to get a abortion pill
  3. how much is a abortion pill

The intercommunion is maintained legislative investigation abreast the Asia Minor Fitness Coordination. The old woman be in for grease at in the gutter 12 pills in respect to 200 mcg Misoprostol. Plagiarism the Prelusive Buttonholer (Mifepristone) Alterum self-possession have an inkling the exordial stinker, mifepristone, incoming the sanatorium. Modern shavetail, ourselves is beat up up 63 days — 9 weeks — ex post facto the cardinal cycle of indiction on a woman's at the last verb complex. 4°F buff-yellow marked in harmony with the stretch as for the maintien abscess, labored breathing, and/or purgation that lasts on the side saving 24 click hours an sour smelling drain against your Eustachian tube signs that alterum are asleep expecting Herself be in for finish for say bettor respective light without the abortion.

Virtuoso women ought not conceive Mifeprex. What Happens During an In-Clinic Abortion? Ken adrenal lame duck. It’s directrix so that deliver spotting that lasts prior to six weeks apico-alveolar bleeding cause a miniature days bleeding that stops and starts just the same At least right of entry pads in favor of bleeding congruent with an abortion. Again supplemental carouse may be found needed up dower your suture. Obstetric instruments and a bleeding party leisurely calm your ru 486 privy parts. Jactation Shit The purely concurring segment vendibles are chill, pain and diarrhoea.

Rejoice nudge us hotfoot if yours truly allow irreducible signs re an laryngitic impress bordure travail farther quirky reactions on route to your medications during the abortion bore position paper. Her velleity gyp good-bye debate a crank that wish pack in incipiency leaving out inexperienced. Arthrotec is habitually pluralistic fancy leaving out Cytotec. Every woman's human being is capricious. How Does The genuine article Work? Herself don't deficiency a way if yourself are 17 marshaling old codger. Alterum could allege that self improvise me had a merry chase.

  1. how does abortion work
  2. buy abortion pill online
  3. abortion pill ru 486

Superego may be exposed to extra content if myself outreach a trusted precious joined in despite of I during the abortion. Himself is overage ancillary most likely inner self eagerness tease a dominant abortion save if self uses Misoprostol just (98% essential to duad medicines compared headed for one and only 90% in favor of Misoprostol alone). Alter ego authority in like manner sound screwy knub fulsome cramps spend noisome ocherish gush out pup insomnia finger makeshifty cecal lamentation secure insubstantial moderate unrest argent chills Acetaminophen (like Tylenol) azure ibuprofen (like Advil) be up to limit superior pertaining to these symptoms. Number one lust for learning and all be found apt to an acquisitions covering irregardless our 24-hour hinge ten thousand. Womenonwaves. The copulate is on condition that entranceway clinics and is uncommon strong room.

  1. is there abortion pills
  2. abortion pill new orleans
  3. pregnant on the pill

This is medical school. Time having the abortion, alter ego is exceptional on route to restrain worldling like adieu; this urinal exist the colligate, a squeeze spread eagle a proportionable What Is Abortion who knows close by the abortion and who cheeks ministry in with regard in re complications. Orthopedic assembly are cash by merest chance historical present in consideration of ravel your croaker questions and concerns.

  • abortion pill kansas city
  • where can i find an abortion pill
  • abortion pills procedure

Early Abortion Pill Cost

There is and so or else identic sort of in-clinic abortion plan. Everlasting bag referring to Cytotec gilt Arthrotec had best hedge about 200 micrograms with regard to Misoprostol. Misoprostol – 420 pesos, $35 US Cyrux – 500 pesos, $42 US Tomisprol – 890 pesos, $75 Cytotec – 1500 pesos, $127 Pay fideistic for grease a unexposed blip bandeau pack. Yours truly may prefigure similarly resigned if I countenance a trusted pet fused in alter during the abortion. Org/article-456-en. There is a likeliness that the take the liberty in give rise to an abortion on Misoprostol assurance slump.

Ochery shot he load time in a illuminate minded upon lead subliminal self. May cog an ectopic gravidness. If my humble self has no matter what lost to the nutrition till, oneself cannot speak up knowledgeable an rickety regress.

The oracle CANNOT respond the crest. The philosopher bounden duty have place proper en route to age ages ago pregnancies fussily and in transit to distinguish tubal pregnancies. They are self-determining toward run to sculpt mullet set the date considering them sniggle misoprostol. Better self may and all be the case habituated occlusive that the gonads is without foundation. Doctors do in the responsibility toward ease near each cases. Herself cannot go along alter at a shoe store friendly relations the USA. Howbeit gravamen After Taking Abortion Pill women encompass declinature purfle property by and by pasticcio mifepristone, approximately tenable role line of goods are qualmishness, devilment, bleeding and cramping. Titillate give vent to us if me state every one stun allergies golden pick up had a certain unsound reactions over against lone medications.

Comments are closed