Working on setting up commenting, which is highly suggested for sites who's content appears on Drupal Planet, I came across a bit of a confusing situation in regard to URLs in content. When using the "Limit allowed HTML tags and correct faulty HTML" filter, one of the option is to add rel="nofollow" attributes to anchor tags. However, in the default Plain Text format, the "Convert URLs into links" filter does not provide that option. So if a user types in an HTML anchor, nofollow gets added. But if they type in a plain URl, it gets converted to an HTML anchor without the nofollow.
To illustrate, if I allow anchor links to be entered as html and set the option to add rel=nofollow and I also enable the filter to convert URLs to links, if a user enters:
www.nytimes.com <a href="www.nytimes.com">Another NY Times link</a>
The output HTML in the comment is:
<p> <a href="https://www.nytimes.com">https://www.nytimes.com</a><br> <a href="www.nytimes.com" rel="nofollow">Another NY Times link</a> </p>
For commenting, I really want to tighten permissions down as far as I can to avoid potential security risks, so the Plain Text format with the "Display any HTML as plain text" filter is the best choice1. However, for usability I do want URLs converted to links. But I also want those links set to nofollow for link fraud prevention2.
By playing with format filter configurations and ordering I was able to make a solution that works (albeit a little janky-ly), but it sure feels like this is an area where a core patch could improve the situation. If I have time one day maybe I'll work on that3.
The solution I came up with is to set the following filters on the input format (the order is significant):
- Display any HTML as plain text
- Convert URLs into links
- Convert line breaks into HTML (i.e. <br> and <p>)
- Limit allowed HTML tags and correct faulty HTML
Then for the allowed HTML tags, I allowed <a href hreflang> <p> <br> and checked the `Add rel="nofollow" to all links` option.
The result is that user entered HTML is rendered as plain text, then URLs and line brakes get converted to HTML, and finally the Limit allowed HTML filter double checks the markup and adds `rel="nofollow"` to anchor tags. So given a user input comment like in the screen shot below, the resulting HTML is:
<a href="http://www.nytimes.com" rel="nofollow">www.nytimes.com</a><br> <h2>This should not be displayed as an h2 element.</h2><br> <a href="<a href="http://www.example.com">If" rel="nofollow">www.example.com">If</a> this is a link to example.com and not <a href="http://www.nytimes.com" rel="nofollow">www.nytimes.com</a>, you've failed.</a>
Now, this solution is not perfect. Mostly, it's hinky to set up and I hate that I have to allow any HTML, even if user input is first stripped to plain text. Secondly though, it's also a user experience problem. As you can see in the picture above, the help text says that no html is allowed and that the anchor, break, and paragraph tags are allowed.
1. using the core commenting facility at least. Add on tools like Disqus obviate the issue but I don't want to go that route. I also don't want to require (or even allow) users to register before commenting. And yes, I do require approval of comments before they are visible, but I don't want to have to remember to add rel=nofollow in links.
2. Yes. I want to eat my cake and have it too.
3. I was put on this earth to achieve certain things. At this point I'm so far behind I'll never die.