Adventures in D8'ing

Submitted by Barrett on Sat, 07/25/2015 - 10:07
Adventures in D8'ing

It's time to make the jump to D8.  Let's face it, my site is mostly a haphazardly maintained blog site; it could really be done in GitHub Pages without real loss of functionality. There's really nothing beyond core that I need, so might as well make the jump now.

First thing I notice: is pathauto still not in core? REALLY? When was the last time you saw a site using node/xxx paths?

Barrett Sat, 07/25/2015 - 10:07

Drupal4Gov Half-Day Security Session

Submitted by Barrett on Mon, 04/06/2015 - 19:59
Drupal4Gov Half-Day Security Session

Mid last month, Dave Stoline and I presented the Security session at the Drupal4Gov half day event at OPM. The PDF of the slides is attached here. Dave focused on common security vulnerabilities and working with the Drupal Security Team while I talked about security related trends like the use of a separate edit domain and HTTPS everywhere.
 

Attachment-1.jpeg

Barrett Mon, 04/06/2015 - 19:59

Web.config as a honeypot

Submitted by Barrett on Fri, 12/12/2014 - 08:43
Web.config as a honeypot

One of my clients has started using a automated security scanning tool to regularly crawl their Drupal site looking for vulnerabilities. In their first run of the tool, it identified only one issue: a "predictable resource location" vulnerability based on the presence of the web.config file in the Drupal docroot. Since web.config is used for IIS servers and they're not on IIS, the presence of the file isn't really a security issue for them but, if the scanner is going to complain, it's easy enough to remove it from the code base[1]. The web.config file is commonly flagged by such scanners and just as commonly removed from the code base.

What I hadn't considered before is an idea proposed by the client's tech lead: monitoring access attempts for the web.config file as a honeypot for malicious traffic. Since they're not using IIS, any access of that file may well indicate a drive-by vulnerability scanner. Obviously what you do in response to it could be complicated and you wouldn't want to automatically block the IP or something, but flagging the IP for investigation is reasonable and easy enough to do, especially if you're using a log monitoring tool like Sumologic where you can set up automatic alerts.

footnotes

1
Personally, though, I think it's more maintainable to add a deny rule for it in the .htaccess file. That way, you don't have to remove the file again each time you update core, you have to merge your .htaccess which often has custom rules anyway. A simple rewrite rule like the one below does the trick

RewriteRule ^/?web\.config$ - [F,L]
Barrett Fri, 12/12/2014 - 08:43

Tags

Periodic Assignment module

Submitted by Barrett on Sun, 11/02/2014 - 11:57
Periodic Assignment module

https://www.evernote.com/shard/s90/sh/87799ffd-baf2-4ff9-b028-185dd2d0e…

I've been toying with an idea for what I'm presently calling the Periodic Assignment module. The basic idea is that a user should be able to subscribe to get intermittent assignments from a Drupal site. The user would then complete the assignment and the site would log the completion of the assignment. For instance, given a set of different exercises, I could have the site email me once an hour with a random exercise to do. Or for people with a practice of journaling, the site could have a list of topics on which they could write on and once a day the site would send them one of those topics as a prompt to journal.

It's still very much conceptual at the moment, but the idea is solidifying as I mull it over. If you're interested, what documentation/planning I've done so far is at the link above.

Barrett Sun, 11/02/2014 - 11:57

The Conflicted Developer

Submitted by Barrett on Sat, 10/04/2014 - 12:28
The Conflicted Developer

"The only thing that saves us from the bureaucracy is inefficiency. An efficient bureaucracy is the greatest threat to liberty."

- Eugene McCarthy

I always feel conflicted about my role as a developer when thinking of this quote from McCarthy. As a developer, I want the most efficient possible data recording, indexing, and searching. At the same time, though, in the past it was inconceivable to implement mass-monitoring of the public because the effort to record, store, and correlate the information was simple too great. Clearly that's no longer the case (or at least, is rapidly becoming less of the case), as witnessed by the NSA monitoring programs, the Great Firewall of China, etc.

Barrett Sat, 10/04/2014 - 12:28

Tags

What does the White House's Executive Order mean for Open Government

Submitted by Barrett on Sat, 08/10/2013 - 09:14
What does the White House's Executive Order mean for Open Government

I wrote the following piece after looking into the Executive Order on Open Data. The original is published at https://www.acquia.com/blog/what-does-white-houses-executive-order-mean…

The White House's Executive Order of May 9, will cause a shift in the way that Federal agencies present data. The Executive Order, “Making Open and Machine Readable the New Default for Government Information,” mandates that, “the default state of new and modernized Government information resources shall be open and machine readable.”

So what does this mean for Federal agencies? Well, that’s a bit nebulous as of right now. But the Order from the White House does define three milestones for issuing further guidance and clarification, of which two have been met. The third was due to be complete August 7, but has not yet been released. To date, the OMB has released the Open Data Policy memo and the White House has published Project Open Data, an online repository of tools and implementation information.

What is not yet available is the Cross-Agency Priority (CAP) Goal due from the Chief Performance Officer. While the Open Data Policy memo and Project Open Data likely provide sufficient detail around which to effectively implement a program in keeping with the goals of the Executive Order on open data, it is the CAP Goal against which Federal agencies must report their progress.

Did I mention the first of those reports must be made by 180 days from the date of the Order? That means federal agencies must first report by November 5, 2013, with subsequent reports made quarterly.

What we do know is that the Open Data Policy memo defines five major requirements:

1. Collect or create information in a way that supports downstream information processing and dissemination activities
2. Build information systems to support interoperability and information accessibility
3. Strengthen data management and release practices
4. Strengthen measures to ensure that privacy and confidentiality are fully protected and that data is properly secured
5. Incorporate new interoperability and openness requirements into core agency processes

These top-level requirements shouldn’t be anything which Federal agencies aren’t already doing. Of course, each top-level requirement contains sub-requirements and those are where the rubber meets the road. Four of these sub-requirements in particular are likely to be new activities for Federal agencies, so are worth highlighting here:

The First requirement, 1a, mandates the use of “machine-readable and open formats.” This means that information must be stored in a format which can be easily read by a computer without loss of meaning (machine-readable) and that the format should be public rather than proprietary (open formats).

The Second requirement, 1c, calls for the use of open licenses for released data sets. Open licensing of released data ensures that the data can be used with no restrictions on how it is transmitted, adapted, or otherwise used for either commercial or non-commercial purposes.

The Third requirement, 3b, mandates agencies to establish and maintain a public data listing. Specifically, agencies must host a www.[agency].gov/data URL which provides a listing of the datasets which could be made publicly available. This listing must be available in both human-readable (i.e., html) format and machine-readable format, in order to allow data.gov and other aggregators to discover agency data sets. The maintenance clause here is key. The listing must be maintained and updated over time to comply with the requirement. Additionally, the listing should not include just those data sets which are available but also those which could be made available. That means if your agency is working with a data but has not fully cleaned and stripped out PII, it should still be listed at your /data URL.

The Final requirement, 3c, requires agencies to engage with customers (i.e., the public) to “facilitate and prioritize data release.” This builds on requirement 3b to add a mechanism by which the public can provide feedback to the agency in question to help set the priorities for release of additional datasets, influence the formats in which data is released, or otherwise shape agency data release processes.

Big changes are underway for the practice of data management in the Federal sector. More data release and more collaboration with customers can only serve to increase the rate of change and open up new areas in which the government is “by the people” and “for the people.”

In subsequent posts in this series we’ll delve further into the Open Data mandate and explore possible solutions for implementing its requirements in Drupal.

Barrett Sat, 08/10/2013 - 09:14

How to Determine Which Nodes Are Using Pathauto Paths

Submitted by Barrett on Sat, 11/17/2012 - 19:39
How to Determine Which Nodes Are Using Pathauto Paths

Recently at work, one of the site managers asked for a listing of which nodes on the site were using the auto-generated Pathauto paths and which were not. Should be easy, right? Just figure out where Pathauto stores whatever variable it uses to indicate if a node has the "Automatic alias" parameter set and dump the list. Turns out, actually no, it's not that simple. Pathauto doesn't store a variable. Instead, it has a set of logic checks that determine if the "Automatic alias" flag should be set or not each time a node is edited.

What you have to do, then, is load each node in question and compare the path it's using against the path which Pathauto would use if it were going to generate an alias. The pathauto_create_alias() function will tell you the later part, so long as you pass 'insert' as the second parameter instead of 'update'. (Otherwise, the function returns empty if it determines a new alias is not needed).

The full script I put together to satisfy the site manager's request was:

// make pathauto functions accessible
module_load_include('inc', 'pathauto');

// get published nodes from the database
$query = "select nid from {node} where status = 1";
$resultSet = db_query($query);

// load each node, compare path to what pathauto would generate and output
while ($nid = db_result($resultSet)) {

  $node = node_load($nid, NULL, TRUE);

  $placeholders = pathauto_get_placeholders('node', $node);
  $alias = pathauto_create_alias('node', 'insert', $placeholders, "node/$node->nid", $node->type, $node->language);

  // a simple boolean to indicate if the paths match, to make sorting/filtering of the results easy
  $aliasMatch = ($node->path == $alias? 1 : 0);

  // strip line breaks, tabs, and pipes from the titles cause they make a mess when we try to
  // open the output in Excel, then convert multiple spaces into a single space
  $cleanTitle = str_replace(array("\r\n", "\r", "\n", "\t", "|"), " ", $node->title);
  $cleanTitle = ereg_replace(" {2,}", ' ', $cleanTitle);

  echo "$node->nid|$node->type|$cleanTitle|$node->path|$alias|$aliasMatch" . PHP_EOL;
}

?>
Barrett Sat, 11/17/2012 - 19:39

Tags

Migrating from CVS to Git

Submitted by Barrett on Sat, 11/03/2012 - 20:03
Migrating from CVS to Git

One of the initiatives begun since I came on board at USP is to convert the team from using CVS for version control to using Git. CVS was performing adequately in most respects, but the leadership recognized that Git is the de facto industry standard these days and that, beyond the technical benefits of moving to Git, there was value in keeping up with the standards of the industry and the Drupal community. The question, then, was how to best accomplish the change.

The first course we investigated was using git cvsimport. Our hope was that this would allow us to move our entire repository history into CVS. When we started testing it out, though, we found the results to be unclear and were not certain that everything was reflected in the resulting branches. Based on the uncertainty, we decided instead to do a more static conversion and leave the pre-Git commit history in CVS.

Because of the business needs we had to support, we decided on a branching strategy of using three long running branches; one for each of our development, testing, and production environments. Because there was no point in time when all three environments were in sync, each branch needed to be initialized independently. We also wanted to minimize the downtime, particularly on production. We decided to essentially wrap CVS in Git and that our process would be:

  1. Create an empty repository on the origin and add a .gitignore file to take care of the files we didn't need to track (like the CVS directories and the .# files CVS uses for tracking.
  2. Add files from the production web server by:
    1. running git init on the production server
    2. adding the repository created using git remote add
    3. checking out the master branch to pull down the .gitignore file we created
    4. adding and commiting all the files from production to the master branch of the repository
  3. Set up the dev and test branches by:
    1. moving all the files from the environment to another location
    2. cloning the repository onto the web server
    3. creating and checking out a new environment branch (i.e., dev on the dev server or test on the test server)
    4. moving the original files back onto the web server
    5. adding and committing the changes to the environment branch

The end result of this process was that we had a working Git repo and still have access to all the CVS history by running CVS on the same file set. All that with no downtime on production at all and only about thirty minutes of downtime on each of the dev and test environments (including a lot of running git status to double check that everything was as we expected).

Barrett Sat, 11/03/2012 - 20:03

Tags

hook_career_alter()

Submitted by Barrett on Thu, 07/05/2012 - 09:42
hook_career_alter()

On July 13th, I'll bid farewell to GDIT and the EOP and move on to the next step in my career. It's been an honor to work with the team there and I've learned more in the past year than I would have imagined possible, a small sampling of which are below.

  • #!'s in URLs are forever.
  • --skip-lock-tables. For the love of dog, use --skip-lock-tables
  • Team dynamics are critical. The right combination of people in the right environment can produce some amazing outcomes and make even the most arduous environment bearable. Unfortunately, that balance is remarkably fragile. It takes very little mis-management to upset the balance and destroy morale.
  • Unicorns bleed rainbows.
  • cd -: How have I lived this long without having known about this command?
  • Mongo is wonderful until you need to do a simple join query, in which case you're looking at writing at least a hundred lines of application-layer code.
  • CDN's are insanely expensive but worth every single penny when people in Guy Fawkes masks decide to pound on your site.
  • Kanban is a wonderful management tool, created by 3M as a ploy to sell Post Its.*

Credit and disclaimers:

hook_career_alter() was originally developed by webchick. I'm just running a custom implementation of it.
* I have no proof that 3M created Kanban...but it just makes so much sense.

Barrett Thu, 07/05/2012 - 09:42

Tags

When the Birds of a Feather Don't Flock

Submitted by Barrett on Thu, 03/22/2012 - 14:36
When the Birds of a Feather Don't Flock

Some months ago, I got involved with implementing OpenID on Drupal for work. One of the things that struck me was that the OpenID modules appeared to be largely dormant. With DrupalCon coming up, the logical solution seemed to be to get together a Birds of a Feather (BoF) session. I scheduled the session, posted in the OpenID Drupal Group about the session to try to drum up support and interest among those already interested in OpenID, and tweeted about it during conference. Then the time came and I sat there alone.

The question now is, "How do I interpret that no one showed up for the session." Is it me; am I somehow being shunned by the Drupal community? Or maybe it's the technology; no one cares about OpenID. Or maybe it's just that the other BoF's scheduled for that time slot were of greater interest.

Perhaps it's just ego-protection, but I think I can rule out shunning. First, I don't think I have done anything that would warrant a shunning, and, secondly, I think the Drupal community is just too heterogeneous and loosely coupled to effectively organize a conference wide shunning.

It's possible there just isn't much interest in OpenID in the community. Certainly, OpenID has been panned of late (37signals, WekeRoad, Quora) and the lack of activity on the Drupal OpenID Group kind of hints at that interpretation.

It's also entirely possible that other things out-competed it. There were some other very intriguing BoF's going on at the same time. So perhaps the love for OpenID is there, but the love for other BoF's was greater.

I guess we'll see what happens when I propose the session for CapitalCamp this year.

Barrett Thu, 03/22/2012 - 14:36