My Software Security Toolkit

After being a podcast listener for years and years (having new things to stuff into my ears is the only way that chores around the house get done) I was finally a podcast participant. The good people at Electric Cloud coordinate a panel discussion about continuous delivery/deployment and they invited me. Me? Can you believe it?  They probably thought they were inviting a different Martin. Like the time I showed up to lead a unit testing seminar and someone thought they were going to get Martin Fowler.

Poor guy.  I can only imagine the depth of his disappointment.

Anyway, here’s the discussion about security concerns and devOps.

Afterwards, I thought it would be useful to share the tools/techniques that I’m currently using and why. Many of these things are free or open source, and a bunch of them have perfectly good alternatives.

 

External Auditor by a real-life Security Expert

We do a security audit (including penetration testing) through Applause yearly. If we could afford to do it more often, I totally would. Their security expert not only found potential vulnerabilities, but he also took the time to explain the nature and severity of each problem, and gave us pointers on how to address them.  We distributed the tasks for fixing them throughout the team as a way to build this sort of security knowledge into the team broadly. We will be less likely to accidentally re-introduce the same vulnerabilities that we didn’t understand before than if we had to designate some poor sap as “the security guy”.

 

“Continuous Everything” tools for automation around build/test/deploy/monitor

While not strictly security tools, being able to ship incremental improvements to code in a safe and repeatable way is key.  We’ve built our infrastructure and process in such a way that we can do fully automated zero-downtime production deployments. It sounds like a luxury for a small team, but yesterday’s luxuries become today’s necessities.

Dependency and Release Management Tools

  • Gradle
  • Sonatype Nexus
  • Our own DIY version stamping, where you can ask any service on any environment exactly what build # it’s running. When you release to many standalone services frequently, you need to know exactly what code is running where.

Static Analysis Tools

Automate the things you can, create a cadence for the things you can’t. There’s actually a LOT of code inspection that you can currently automate.  My goal is to get to zero errors/warnings/anything, but until then, we’ll just ratchet the numbers down bit by bit.

 

Monitoring / Intrusion Detection

Our hosting provider provides network level firewall, monitoring, intrusion detection, and DDOS protection in a way that’s pretty much transparent to me, as a developer. In the spirit of “trust but verify” we have tests that (for example) you can’t connect on ports you shouldn’t be able to connect to.

We also install OSSEC on all of our systems, so if something fishy shows up in the logs, we’ll be notified right away.

 

 

Small Victory : Rally-Slack Integration

One of the things I missed about moving our team’s communication hub from Flowdock to Slack was the nice integration with Rally (ahem, CA Agile Central). It was good to know when things like the schedule state or ownership changed.  Unfortunately, there is no such default connector for Slack. So I made one.

This is my first time using Python, and it took me a while to get it all configured properly (the pyral package only works with Python2) but after a few pip installs I was up and running. I managed to get it to only post to slack for things I care about (changing rank or color? I don’t need to spam the team with that). The output is below:

rally_slackbot

I’ve got it running via cron every 15 minutes, which is about right, we don’t need to know the exact moment someone moves a story into “In Progress”, knowing that up to 15 minutes should be just fine.

In a fun bit of irony, I really hate working with cron expressions.

TeamCity and the One-Way Quality Ratchet

I’ve recently fallen in love with a simple TeamCity feature. The ability to fail a build on changes in metrics in comparison to the last successful build.

If you, like me, are working with an existing (dare I say, “legacy”) code base, the effort to get everything cleaned up to a minimum standard can be daunting. I’m using FindBugs for static analysis, and while I would like to be at zero errors and warnings, I don’t have the luxury of just stopping all development and doing nothing but fixing them.

And is that even a luxury? Sounds like a rather tedious and soul-crushing exercise.

So, what I’m doing instead, using the TeamCity build to incrementally ratchet the number of static analysis issues/warnings steadily toward zero by making it impossible to for that number to increment, only decrement.

metric_change

Now that I’ve got this particular quality safety net in place, we can just tidy up code as we go along. Sometimes when waiting on someone or something else, I go through the warnings and fix the easy ones.

I can imagine if you’re in a really nasty legacy code situation, where you have tests that don’t work, you could do the same thing with failing tests, eventually pounding down the number of test failures to zero.

 

Spring Cleaning Time!

You know those people who keep everything forever, even things that they know they don’t need, things that may or may not even work, just because it might come in handy later? In the real world, those people called hoarders. Sometimes it’s considered a pathology, which requires specific intervention by trained mental health experts.

You know those people who never delete their code, even code that they know they don’t need, things that may or may not even work, just because it might come in handy later?  Maybe they mark what they are hoarding with a comment like “This isn’t used anymore” or even the more ambitious “todo: remove altogether?”.  Maybe they just comment out huge sections of code.

This is pathological. If you’re using any form of source control, just clean up the damn garage. If you’re not using source control, then now you have a good reason to start.

 

 

 

Thoughts on the inter-relatedness of software architecture, infrastructure, and process. -or- Three tasty recipes for software development failure.

A few weeks ago I was tasked with putting together a brief show-and-tell about the architecture of the software we’re building for someone on our board of directors. No pressure or anything. Easy.

When trying to figure out how to tell the story for a moderately techie audience, I kept coming back to talking about our infrastructure (specifically our automation for build/test/deploy/monitor) and our process (specifically our focus on incrementalism). I realized that they all influence each other in a circular and self-reinforcing way. I couldn’t really tell the story of one without telling the story of the other two.

triangle

Credit where credit is due: this idea is influenced by a similar principle from the fascinating psychological field of cognitive behavioral therapy, where the three things that influence each other in a perpetual feedback loop are cognition, emotion, and behavior. 

 

For the longest time, I didn’t really care about infrastructure or process, and all I was concerned with was the architecture and code quality. After all, if you make enough good code fast enough, isn’t that all that matters? I also associated most process with overly-prescriptive one-size-fits-all nonsense (which it usually is). That approach doesn’t scale to a reasonably sized professional team, though. Having a solid (lightweight and flexible, please) process is necessary to have a handle on things once your project is so large it can’t all fit inside a single individual’s head. Similarly, having a solid (lightweight and flexible, please) infrastructure reduces friction and improves safety. If I can do zero-downtime production deployments to make small changes, I’m more likely to do so.

 

Some Real World Examples

“Developers should write automated tests” is (definitely) good process advice, but if the architecture is a tightly coupled mess of arcane but necessary side effects, your test automation experience will be miserable. Similarly, if you don’t have the infrastructure to run the tests in an automated (ideally continuous) way, the miserable time spent on writing tests may be wasted. What’s the opposite of win-win? It’s lose-lose!

“Use pull requests for code reviews” is (usually) good process advice, but only works if you have the infrastructure to support it. GitHub and BitBucket are great. GitLab is also remarkably good considering the price.

“Use the cloud” is (maybe) good infrastructure advice, but if your system architecture isn’t built with “the cloud” in mind, your migration-to-the-cloud will be pointless, needlessly expensive, or (most likely) both.

“Use Micro-Services” is (maybe) good architectural advice, but if you don’t have the infrastructure in place to safely deploy incremental changes, you’ve just exchanged one headache for n headaches. (more on that here…)

Special Bonus: Free Recipes for Software Project Failure!

Recipe variation #1. (basic)

Make sure that the efforts of improving process, architecture, and infrastructure are exclusively owned by different people who are specialized in only one third of the triangle. Professional certification and rigid role expectations should help with this.

Recipe variation #2.  (intermediate)

Absolutely keep these specialist experts from collaborating in any cross-functional way.  Isolate them in distinct organizational silos and create incentives built around expanding and protecting their control their third of the triangle. Creating a culture of blame and paranoia should help with this.

Recipe variation #3. (advanced, only for the very bravest failure chefs)

Give these organizational silos a meaningless and alienating jargon name. For example, you can call your group of hands-off process people (Certified Scrum Masters, ideally) who dictate and enforce everything about process your “Agile Center of Excellence” so that any hope left in the hearts of your software team gets completely snuffed out.

What’s this? Oh, nothing. It’s just an engine for hatred and despair

A friend of mine recently posted a somewhat anti-“Agile” rant on Facebook a while ago and I had a momentary urge to throw some arguments back at him. After all, I cannot be silent while one slanders the glorious movement!

Actually, I can.  Any glory of the movement has been extracted and defiled long ago.  Most “Agile” implementations I’ve seen have either been bottom-up revolts from under-empowered developers or top-down mandates from over-empowered managers. Neither approach actually works.

It’s almost as if mixing a prescription of meetings with obscure rugby jargon isn’t enough to fix all of an organization’s problems. You can, if you need to, substitute “XP” or “Lean” or “Kanban” into that last sentence, In the case of Kanban, you get to use Japanese manufacturing jargon instead of rugby jargon because business school guys steeped in a long tradition of Toyota worship really like that.

Obligatory response to “If it’s not working, you’re not doing it right.” Of course we’re not doing it right. Nobody does it right.

I’m not about to bust out the Gantt charts and sharpen my pencil for 200 page technical specification writing, though. Because as much as I’m no longer really pro-agile, I remain militantly anti-waterfall.

The secret ingredient? It’s HATE. Usually it’s love, but…

 

Why? Let’s look at what happens in a rigid waterfall project:

Firstly, all of the decisions are made at the beginning of the project, a time when (hopefully) everyone knows the least.  Also, as the project is probably going to be relatively long, the business has a perverse incentive to cram as much as possible into the specs instead of thinking small and designing only the things that they know they actually need.

Next, the developers will start to push back and say “woah, that’s too much!” in a way that tends to get framed as a negotiation instead of a collaboration. The business guys are almost better negotiators than the developers, so the developers walk away feeling steamrolled. Now the developers start to resent the business as they are committed to a date and deliverable that they couldn’t give meaningful consent to. The macho culture of overwork (long hours, late nights, weekends) only makes that worse.

As this is a big project that you have one shot at, the stakes are really high to get things right the first time. Developers and architects over-develop and over-architect, respectively.  They build themselves hooks and layers and infrastructure and generalizations that they might someday end up needing, just in case.  The software, an abstract and ethereal beast by nature, becomes even more disconnected from any customer-facing business reality. Developers start having arguments with religious zeal about things like style and formatting. The developers start to hate each other.

After months of sound and fury which seems to signify nothing, the business starts to worry. Why haven’t we seen anything? Let’s have more frequent status meetings.  Developers feel distrusted and start to hate the business for breathing down their necks.

After a chunk of working at an unsustainable pace, the developers say “Hey! Now we have something to look at! Are you happy now?” Followed almost immediately by “What? You want that to behave differently? But this is what you signed off on!” and “Of course that part isn’t done yet, I didn’t tell you that part was done, did I?” and “Stop complaining about the fonts, we can fix the fonts later. Sheesh

Eyes are rolled. The business starts to hate the developers for being so difficult and condescending.

Once the thing more-or-less works, it’s bundled up and sent over the wall to QA. As QA now has less time than they managed to negotiate but no room to adjust. That’s what you get for being at the tail end of a fixed-date schedule.  When they find “real” bugs, developers resent them for making them look bad. When they find things that aren’t “real” bugs, developers berate them for being wrong.

The testers start to feel that they are the only people who care about the users, and that the crappy developers don’t even care enough to fix the goddamn bugs. They fight among themselves about what bugs are “priority one” and what bugs are “priority two” and bug #2628 is really a duplicate of bug #2599. The intellectual output of some of the best minds in our generation has been spent arguing over differing priority levels of bugs that will probably never get fixed anyway.

Eventually, the “critical” defect count gets low enough that the business decides to release the darn thing, even though the QA folks think they are wrong to do so, there’s nothing they can do to stop it.

Maybe there’s a launch party. We’re done! Bust out the French Champagne! Everyone feels a little more uneasy than celebratory, however.

Customers start to get their hands on it.  It’s radically (and pointlessly) different from the last version, so existing customers hate it because it changes their expectations. It also has a lot of bugs in it, so new customers get to hate it as well.  Tech support hates everyone for releasing this half-baked beast into the wild and putting them in a difficult position. Everyone scrambles to put together “hot fixes”, the releases of which never warrant French Champagne.

Upon hearing all of the complaints, the business obliquely blames QA for not finding all of the bugs (as if the existence of bugs that they didn’t create is somehow their fault) with the semi-rhetorical “did anyone even test this?” QA hates this, naturally, and now they get to hate the developers for making them look bad.

Those who have the strength to keep going get to jump into another cycle! As all of the problems were obviously caused by lack of up-front design and documentation, so we’ll do even more of that. Also, as the code has degraded into a dangerous pile of spaghetti, broken glass, and human blood, we’ll just completely rewrite it with an entirely new architecture and “paradigm” to have fresh arguments over.

 

Part of the tragedy is that throughout this whole process, everyone is doing their best. The business people aren’t “bad” business people, they just lack super-human foresight. The developers aren’t “bad” developers, they just have normal emotional reactions to work they are invested in. The QA people aren’t “bad” QA people, they just lack super-human omniscience.  Though I have worked with some QA people who were to close to omniscient it was scary, that’s a different story.

The whole arrangement is an engine for hatred and despair, an engine fueled by tribalism and resentment. The only thing holding these isolated hate-filled groups together is their even greater hatred for everyone else.

It’s also an engine for guilt and shame. People feel that they are never following the “proper” process “properly”, and they sneer at those who don’t even try to follow the process as either “cowboy coders” or “gullible fools who drank the agile kool-aid”.

So, go ahead. Mock “agile” all you want, but let’s remember the historical context, shall we?

Concept: the Developer Blind Spot

Concept #1. Knowledge Work

Software developers are largely “knowledge workers”, which (broadly) means that the job they do day-to-day is something that their manager can’t do, at least not as readily.

Note: this isn’t the typical tribalist anti-business developer rant. I’m not talking about “pointy haired bosses”. I’m talking about the fact that much of the time, developers are managed by people who either aren’t currently or haven’t ever been, software developers themselves.  I think that’s fine, but it leads to something that I’m calling the “developer blind spot”

Concept #2. Signal Attenuation

signal_example

Even the best transmission medium has some signal attenuation.

As an idea goes through multiple people, it can change subtly, so an idea such as “for my service types, I like to encapsulate construction instead of using ‘new foo()’ to get an instance, so my consuming code isn’t tightly coupled to a specific concrete implementation” gets mangled to become “never ever use ‘new foo()‘” which a developer will hear and say “WTF? That doesn’t even make any sense!“. Because it doesn’t make any sense, not once the signal has degraded that far.

Concept #3. Fashion and Skepticism

Many ideas come and go, and it’s time consuming to try to differentiate solid ideas from fads, snake oil, and deep-sounding fluff that some executive got from a breathlessly enthusiastic book he read on an airplane.

In a perfect world, executives would limit their airplane reading to trashy fiction.  Feel free to contact me for specific suggestions as I read a lot of really trashy fiction that I don’t exactly feel comfortable blogging about.

Conclusion

Show. Don’t tell.

If you’re managing or leading a software team and want your team to do things differently, it’s possible that your (brilliant) advice will hit your team right in their blind spot and be ignored.  If you really want to plant an idea into a developer’s head, you need to obliquely inject it, Inception-style. Demonstrate new ideas in a proper real-world context, without appeals to top-down authority, blog posts written by random dudes (especially mine), or light airplane reading. Connect real-world developers with their peers and let them have meaningful conversations without latency and signal attenuation.  Give people enough creative space and room to experiment that they feel invested in these ideas. Anything new must actually make sense with the dozens of micro-scale cost-benefit decisions that good developers are making all day every day. Otherwise, they’re just checking mysterious magic checkboxes on some mysterious magic checklist because some manager says so for mysterious magic reasons.

Follow

Get every new post delivered to your Inbox.