The Work That Makes Civilized Life Possible (and finding the people to do it) [REPOST]


So in honor of Systems Administrator Appreciation Day, and because I have a new job in production systems at Athenahealth (more about that at the bottom of this post), I’m slightly modifying and reposting a blast from the recent-past.

“So what exactly would you say you do here?”

I’ve flown out to remote locations and been on-site for the build-out and spin-up of three new production data centers within the last year. I’ve been present for load tests and at public launches of new video games’ online services and product and feature launches to predict and solve system load issues from rushes of new customers hitting new code, networks, and servers. And yes, I’ve spent my share of all-nighters in war rooms and in server rooms troubleshooting incidents and complicated failure events that have taken parts of web sites, or entire online properties offline. I wasn’t personally involved in fixing healthcare.gov, but that team was made up of people I would consider my peers, and some people that have specifically have been co-workers of mine in the past.

Do you use the internet? Ever buy anything online? Use Facebook? Have a Netflix account? Ever do a search on Duck Duck Go or use Google? Do you have a bank account? Do you have a criminal record, or not? Ever been pulled over? Have you made a phone call in the past 10 years? Is your metadata being collected by the NSA? Have you ever been to the hospital, doctor’s office, or pharmacy? Do you play video games? If you’ve answered yes to any of the above questions, then a portion of your life (and livelihood) depend on a particular group of professional engineers that do what I do. No, we are not a secret society of illuminati or lizard people. We do, however, work mostly in the background, away from the spotlight, and ensure the correct operation of many parts of our modern, digital world.

So what do we call ourselves? That’s often the first challenge I face when someone asks me what I do for a living. My job titles, and the titles of my peers, have changed over the years. Some of us were called “operators” back in the early days of room-sized computers and massive tape drives. When I graduated college and got my first job I was referred to as a “systems administrator” or “sysadmin” for short. These days, the skill sets required to keep increasingly varied and complex digital infrastructure functioning properly have become specialized enough that this is almost universally considered a distinct field of engineering rather than just “administration” or “operations”. We often refer to ourselves now as “systems engineers,” “systems architects,” “production engineers,” or to use a term coined at Google but now used more widely, “site reliability engineers.”

What does my job entail specifically? There are scripting languages, automated configuration and server deployment packages, common technology standards, and large amounts of monitoring and metrics feedback from the complex systems that we create and work on. These are the tools we need to scale to handle growing populations of customers and increased traffic every day. This is a somewhat unique skill set and engineering field. Many of us have computer science degrees (I happen to), but many of us don’t. Most of the skills and techniques I use to do my job were not learned in school, but through my years of experience and an informal system of mentorship and apprenticeship in this odd guild. I wouldn’t consider myself a software engineer, but I know how to program in several languages. I didn’t write any of the code or design any of our website, but my team and teams like it are responsible for deploying that code and services, monitoring function, making sure the underlying servers, network and operating systems function properly, and maintaining operations through growth and evolution of the product, spikes in traffic, and any other unusual things.

“Skill shortage”

Back in 2001, I was working for the University of Illiniois at Urbana-Champaign for the campus information services department (then known as CCSO) as a primary engineer of the campus email and file storage systems. Both were rather large by 2001 standards, with over 60,000 accounts and about a terabyte (omg a whole terabyte!) of storage. This was still in the early part of the exponential growth of the internet and digital services. I remember a presentation by Sun Microsystems in which they stated that given the current growth rates and server/admin ratios, by 2015 about ⅓ of the U.S. Population would need to be sysadmins of some sort. They were probably right, but the good news is that since then our job has shifted mostly to finding efficiencies and making the management of systems and services of ever-growing scales and complexity possible without actual manual administration or operation — so the server/admin ratio has gone down dramatically since then. Back then it was around 1 admin for every 25 servers in an academic environment like UIUC. Today, the common ratios in industry range from a few hundred to a few thousand servers per engineer. I don’t think I’m allowed to say publicly what the specific numbers are here at TripAdvisor, but it is within that range. But, we still need new engineers every day to meet needs as the internet scales, and as we need to find even more efficiencies to continue to crank that ratio up.

Where do the production operations engineers come from? Many of us are ex-military, went to trade schools, or came to the career through a desire to tinker unrelated to college training. As I stated earlier, while a degree in computer science helps a lot understanding the foundations of what I do, many of the best engineers I’ve had the pleasure of working with are art, philosophy, or rhetoric majors. In hiring, we look for people who have strong problem solving desires and abilities, people who handle pressure well, who sometimes like to break things or take them apart to see how they work, and people who are flexible and open to changing requirements and environments. I believe that, because for a while computers just “worked” for people, a whole generation of young people in college, or just graduating college, never had the need or interest to look under the hood at how systems and networks work. In contrast, while I was in college, we had to compile our own linux kernels to get video support working, and do endless troubleshooting on our own computers just to make them usable for coding and, in some cases, daily operation on the campus network.

So generally speaking, recent college graduates trained in computer science have tended to gravitate towards the more “glamorous” software engineering and design positions, and continue to. How do we attract more interest in our open positions, and in the career as an option as early as college? I don’t have a good answer for that. I’ve asked my peers, and many of them don’t know either. I was thrilled to go to the 2014 SREcon in Santa Clara last year (https://www.usenix.org/conference/srecon14), and to attend and present at SREcon 2015 this year. A huge portion of the discussion panels there and the engineers and managers there from all the big Silicon Valley outfits (Facebook, Google, Twitter, Dropbox, etc.) face the same hiring problems. It’s admittedly even worse for us at TripAdvisor as an east coast company fighting against the inexorable pull of Silicon Valley on the talent pool here.

One thing I’ve come to strongly believe, and which I think is becoming the norm in industry operations groups, is that we need to broaden our hiring windows more. We need to attract young talent and bring in the young engineers, who may not even be strictly sure that they want an operations or devops career, and show them how awesome and cool it really is (ok, at least I think it is). To this end, I gave a talk at MIT a little over a year ago on this subject — check out the slides and notes here. I didn’t know that this is what I wanted to do for sure until about a week before I graduated from MIT in 2000. I had two post-graduation job offers on the table, and I chose a position as an entry-level UNIX systems administrator at Massachusetts General Hospital (radiation oncology department, to be more specific) over the higher paying Java software engineering job at some outfit named Lava Stream (which as far as I can tell does not exist anymore). Turns out I made the right decision. The rest of my career history is in my LinkedIn profile (https://www.linkedin.com/profile/view?id=8091411) if anyone is curious. No, I’m not looking for a new job.

“Now (and forever) Hiring”

So, if anyone reading this is entering college, or just leaving college, or thinking of a career change, give operations some consideration. Maybe teach yourself some Linux skills. Take some online classes if you have time or think you need to. Brush up your python and shell scripting skills. At least become a hobbyist at home and figure out some of those skills you see in our open job positions (nagios, Apache, puppet, Hadoop, redis, whatever). Who knows, you might like it, and find yourself in a career where recruiters call you every other day and you can pretty much name your own salary and company you want to work for.

And specifically for my group at Athenahealth? We manage the production infrastructure for athenaNet. We are a cloud-based medical services company working to build the healthcare internet and to improve healthcare in the U.S. through our innovation. We are the #1 practice management system and #2 electronic health record (EHR) system in the country according to the 2014 KLAS survey. The infrastructure that my team runs is counted on and trusted by over 67,500 medical providers and millions of patients. Does any of this sound interesting to you? Even if you don’t think you fit any of the descriptions we have currently listed (like this one) but might be up for some mentoring/training and maybe an internship or more entry-level position, tweet at me or drop me an email and we’ll see what we can do. See you out there on the internets.

Leave a comment

It “snow” comparison


UPDATE: Here is a composite picture that tells part of the story. Click for the full-size image. On the left is from last weekend (February 7th, 4:11pm). On the right is this morning (February 15th, 11:15am). The walls and mountains of snow just keep growing. This includes another 14″ or so that has been added in the past 24 hours (after writing the blog post below).

blizzard

Please excuse the pun in the title (or don’t). If you haven’t heard, Boston has gotten an unprecedented amount of snow over the past three weeks and will probably end up with about 70″ within a 30-day period.

I come from Rochester, New York. Depending on your statistics, it is considered one of the snowiest cities in the U.S (and sometimes is #1 on that list). So my thought has been: have I just been living outside of the snow belt for so long that what used to not be a big deal in my normal winter experience is now completely bizarre to me?

Let’s start by looking at the current leaders of this season’s “snow bowl“. Boston is right there in between Buffalo, Syracuse and Rochester, and is behind the total amount of snow that snow-belt cities Buffalo and Syracuse had at this point last year, and about equal with Rochester. So yes, this is an unbelievably large amount of snow for Boston, but not unprecedented for some urban areas.

The statements so far from the mayor and officials from the local mass transit authority (MBTA) indicate that the real issue here has been an inability to clear out snow from streets, train tracks, and to maintain equipment which the snow and cold has damaged. Mass transit is completely shut down tomorrow again, and a snow emergency continues. This is a good move, in my opinion, because we need an extra day of cars and people not being out there so that the city can do something about removing the mountains of snow.

Re-opening the city so soon after the first storm two weeks ago was, in retrospect, what got us into the current mess. You can only shove so much snow aside before there’s just too much of it and the streets narrow to be impassable. This is precisely the situation we were in here for most of last week. Traffic was at a standstill as two-lane streets became one (or even 1/2 lane) with parked cars and mountains of snow competing for road space. Lets hope the crews make some good progress cleaning up the streets and sidewalks tomorrow while we’re all home from work. I also hope that the embattled MBTA can get its act together. However the lack of budgetary attention paid to that agency from the state over the past few years, and downright animosity from residents of the western suburban and rural part of the state when faced with their tax dollars paying for “city” infrastructure has left the agency in a spot where they just don’t have enough resources to remove all of the snow and repair/maintain equipment. By some estimates last week about half of the trains on certain subway lines were out of service due to weather-related malfunction.

So, good luck to them. And they should get a move-on because the current NOAA forecast discussion indicates a good chance of a potentially significant storm this coming Thursday, and another one Saturday into Sunday.

Now, back to my original question about 70″ in a month. I looked around for record of this happening in a city before (particuarly a major city, like Boston). I found a few of similar incidents in smaller cities though — all in the “snow belt” region of Buffalo-Rochester-Syracuse, of course. Back in 1985, Buffalo had 68 inches of snow in December. That’s less than we’re facing here now, and in a city that’s smaller and probably has a much easier time with snow removal and street clearing (not to mention no large subway/trolley system to also keep clear). In December 2001, the city of Buffalo had a record 83 inches of snowfall, with a maximum of 44 inches on the ground at one point. That seems to indicate to me that there was some sort of thawing in between — a luxury we have not had here in the city of Boston over the past three weeks. In Syracuse, however, there was also a 64 inch December, and supposedly a 97 inch January back in 1966 that I really want to find and read some more about. But other than those few anomalies? I couldn’t find a month of more than 52 inches of snow for Buffalo, Rochester, or Syracuse.

So yeah, 70″ in a less than a month is very very rare for *any* place — even the snowiest cities in this country that deal with blizzards regularly. It’s certainly unprecedented for a major city with a multi-mode mass-transit system and a population over 640,000 (4.5 million in the “greater boston” MSA). In other words, I haven’t gone soft. This really is a whole lot of snow.

Leave a comment

Ranking The Months From Best To Worst


From best to worst:

1. September

2. April

3. October

4. August

5. July

6. June

7. May

8. November

9. January

10. December

11. March

12. February

Agree?  Disagree?  Discuss!

3 Comments

Gmail Password Leak Update


benoc:

Still thinking two-factor auth for Google (and other accounts) isn’t worth the trouble? Might be time to think again. http://www.google.com/landing/2step/

Originally posted on WordPress.com News:

This week, a group of hackers released a list of about 5 million Gmail addresses and passwords. This list was not generated as a result of an exploit of WordPress.com, but since a number of emails on the list matched email addresses associated with WordPress.com accounts, we took steps to protect our users.

We downloaded the list, compared it to our user database, and proactively reset over 100,000 accounts for which the password given in the list matched the WordPress.com password. We also sent email notification of the password reset containing instructions for regaining access to the account. Users who received the email were instructed to follow these steps:

  1. Go to WordPress.com.
  2. Click the “Login” button on the homepage.
  3. Click on the link “Lost your password?”
  4. Enter your WordPress.com username.
  5. Click the “Get New Password” button.

In general, it’s very important that passwords be unique for each account. Using the same…

View original 155 more words

Leave a comment

Bought a House — See New Blog!


So, for those who haven’t been privy to the news, Kristy and I bought a house.  Rather than clog up this blog with that stuff though, I’ve started a new one:

http://rebuildingwheneverland.wordpress.com

First post is up with the basic story and some “before” pictures.  We close officially tomorrow afternoon, and moving day is September 23rd.  As you can see if you look at that blog and gallery, we’ve got a lot of work ahead of us!

House From The Street

House From The Street

Leave a comment

The Work That Makes Civilized Life Possible (and finding the people to do it)


“So what exactly would you say you do here?”

I’ve flown out to remote locations and been on-site for the build-out and spin-up of three new production data centers within the last 10 months. I’ve been present for load tests and at public launches of new video games’ online services and product and feature launches to predict and solve system load issues from rushes of new customers hitting new code, networks, and servers. And yes, I’ve spent my share of all-nighters in war rooms and in server rooms troubleshooting incidents and complicated failure events that have taken parts of web sites, or entire online properties offline. I wasn’t personally involved in fixing healthcare.gov late last year, but that team was made up of people I would consider my peers, and some people that have specifically have been co-workers of mine in the past.

Do you use the internet? Ever buy anything online? Use Facebook? Have a Netflix account? Ever do a search on Duck Duck Go or use Google? Do you have a bank account? Do you have a criminal record, or not? Ever been pulled over? Have you made a phone call in the past 10 years? Is your metadata being collected by the NSA? Have you ever been to the hospital, doctor’s office, or pharmacy? Do you play video games? If you’ve answered yes to any of the above questions, then a portion of your life (and livelihood) depend on a particular group of professional engineers that do what I do. No, we are not a secret society of illuminati or lizard people. We do, however, work mostly in the background, away from the spotlight, and ensure the correct operation of many parts of our modern, digital world.

So what do we call ourselves? That’s often the first challenge I face when someone asks me what I do for a living. My job titles, and the titles of my peers, have changed over the years. Some of us were called “operators” back in the early days of room-sized computers and massive tape drives. When I graduated college and got my first job I was referred to as a “systems administrator” or “sysadmin” for short. These days, the skill sets required to keep increasingly varied and complex digital infrastructure functioning properly have become specialized enough that this is almost universally considered a distinct field of engineering rather than just “administration” or “operations”. We often refer to ourselves now as “systems engineers,” “systems architects,” “production engineers,” or to use a term coined at Google but now used more widely, “site reliability engineers.”

What does my job entail specifically? There are scripting languages, automated configuration and server deployment packages, common technology standards, and large amounts of monitoring and metrics feedback from the complex systems that we create and work on. These are the tools we need to scale to handle growing populations of customers and increased traffic every day. This is a somewhat unique skill set and engineering field. Many of us have computer science degrees (I happen to), but many of us don’t. Most of the skills and techniques I use to do my job were not learned in school, but through my years of experience and an informal system of mentorship and apprenticeship in this odd guild. I wouldn’t consider myself a software engineer, but I know how to program in several languages. I didn’t write any of the code or design any of our website, but my team and teams like it are responsible for deploying that code and services, monitoring function, making sure the underlying servers, network and operating systems function properly, and maintaining operations through growth and evolution of the product, spikes in traffic, and any other unusual things.

“Skill shortage”

Back in 2001, I was working for the University of Illiniois at Urbana-Champaign for the campus information services department (then known as CCSO) as a primary engineer of the campus email and file storage systems. Both were rather large by 2001 standards, with over 60,000 accounts and about a terabyte (omg a whole terabyte!) of storage. This was still in the early part of the exponential growth of the internet and digital services. I remember a presentation by Sun Microsystems in which they stated that given the current growth rates and server/admin ratios, by 2015 about ⅓ of the U.S. Population would need to be sysadmins of some sort. They were probably right, but the good news is that since then our job has shifted mostly to finding efficiencies and making the management of systems and services of ever-growing scales and complexity possible without actual manual administration or operation — so the server/admin ratio has gone down dramatically since then. Back then it was around 1 admin for every 25 servers in an academic environment like UIUC. Today, the common ratios in industry range from a few hundred to a few thousand servers per engineer. I don’t think I’m allowed to say publicly what the specific numbers are here at TripAdvisor, but it is within that range. But, we still need new engineers every day to meet needs as the internet scales, and as we need to find even more efficiencies to continue to crank that ratio up.

Where do the production operations engineers come from? Many of us are ex-military, went to trade schools, or came to the career through a desire to tinker unrelated to college training. As I stated earlier, while a degree in computer science helps a lot understanding the foundations of what I do, many of the best engineers I’ve had the pleasure of working with are art, philosophy, or rhetoric majors. In hiring, we look for people who have strong problem solving desires and abilities, people who handle pressure well, who sometimes like to break things or take them apart to see how they work, and people who are flexible and open to changing requirements and environments. I believe that, because for a while computers just “worked” for people, a whole generation of young people in college, or just graduating college, never had the need or interest to look under the hood at how systems and networks work. In contrast, while I was in college, we had to compile our own linux kernels to get video support working, and do endless troubleshooting on our own computers just to make them usable for coding and, in some cases, daily operation on the campus network.

So generally speaking, recent college graduates trained in computer science have tended to gravitate towards the more “glamorous” software engineering and design positions, and continue to. How do we attract more interest in our open positions, and in the career as an option as early as college? I don’t have a good answer for that. I’ve asked my peers, and many of them don’t know either. I was thrilled to go to the 2014 SREcon in Santa Clara earlier this month (https://www.usenix.org/conference/srecon14), and for the most part the discussion panels there and the engineers and managers there from all the big Silicon Valley outfits (Facebook, Google, Twitter, Dropbox, etc.) face the same problem. It’s admittedly even worse for us at TripAdvisor as an east coast company fighting against the inexorable pull of Silicon Valley on the talent pool here.

One thing I’ve come to strongly believe, and which I think is becoming the norm in industry operations groups, is that we need to broaden our hiring windows more. We need to attract young talent and bring in the young engineers, who may not even be strictly sure that they want an operations or devops career, and show them how awesome and cool it really is (ok, at least I think it is). To this end, I gave a talk at MIT a little over a year ago on this subject — check out the slides and notes here. I didn’t know that this is what I wanted to do for sure until about a week before I graduated from MIT in 2000. I had two post-graduation job offers on the table, and I chose a position as an entry-level UNIX systems administrator at Massachusetts General Hospital (radiation oncology department, to be more specific) over the higher paying Java software engineering job at some outfit named Lava Stream (which as far as I can tell does not exist anymore). Turns out I made the right decision. The rest of my career history is in my LinkedIn profile (https://www.linkedin.com/profile/view?id=8091411) if anyone is curious. No, I’m not looking for a new job.

“Now (and forever) Hiring”

So, if anyone reading this is entering college, or just leaving college, or thinking of a career change, give operations some consideration. Maybe teach yourself some Linux skills. Take some online classes if you have time or think you need to. Brush up your python and shell scripting skills. At least become a hobbyist at home and figure out some of those skills you see in our open job positions (nagios, Apache, puppet, Hadoop, redis, whatever). Who knows, you might like it, and find yourself in a career where recruiters call you every other day and you can pretty much name your own salary and company you want to work for.

And specifically for my group at TripAdvisor? We manage the world’s largest travel site’s production infrastructure. It’s a fast-moving speed-wins type of place (see my previous blog post) and we are hiring. Any of this sound interesting to you? Even if you don’t think you fit any of the descriptions below but might be up for some mentoring/training and maybe an internship or more entry-level position, tweet at me or drop me an email and we’ll see what we can do. See you out there on the internets.

Job Opening: Technical Operations Engineer

TripAdvisor is seeking a senior-level production operations engineer to
join our technical operations team. The primary focus of the technical
operations team is the build-out and ongoing management of Tripadvisor’s
production systems and infrastructure.

You will be designing, implementing, maintaining, and troubleshooting
systems that run the world's largest travel site across several
datacenters and continents. TripAdvisor is a very fast growing and
innovative site, and our technical operations engineers require the
flexibility, and knowledge to adapt to and respond to challenging and
novel situations every day.

A successful candidate for this role must have strong system and network
troubleshooting skills, a desire for automation, and a willingness to
tackle problems quickly and at scale all the way from the hardware and
kernel level, up the stack to our database, backend, web services and
code.

Some Responsibilities:
- Monitoring/trending of production systems and network
- General linux systems administration
- Troubleshooting performance issues
- DNS and Authentication administration
- Datacenter, network build-outs to support continued growth
- Network management and administration
- Part of a 24x7 emergency response team

Some Desired Qualifications:
- Deep knowledge of Linux
- Experienced in use of scripting and programming languages 
- Experience with high traffic, internet-facing services
- Experience with alerting and trending packages like Nagios, Cacti
- Experience with environment automation tools (puppet, kickstart, etc.)
- Experience with virtualization technology (KVM preferred)
- Experience with network switches, routers and firewalls

Job Opening:  Information Security Engineer

TripAdvisor is seeking an Information Security Engineer to join our 
operations team. You will be charged with the responsibilities for 
overall information security for all the systems powering our sites, the 
information workflow for the sites and operational procedures, as well 
as the access of information from offices and remotes work locations.

Do you have the talent to not only design, but actually implement and 
potentially automate firewall, IDS/IPS configuration changes and manage 
day-to-day operations? Can you implement and manage vulnerability scans, 
penetration tests and audit security infrastructure?

You will be collaborating with product owners, product engineers, 
operations engineers to understand business priorities and goals, company 
culture, development processes, operational processes to identify risks 
and then work with teams on designing and implementing solutions or 
mitigations. You will be the information security expert in the company 
that track and monitor new/emerging vulnerabilities, exploitation 
techniques and attack vectors, as well as evaluate their impacts on 
services in production and under development. You will provide support 
for audit and remediation activities. You will be working hands-on on our 
production systems and network equipment to enact policy and maintain a 
secure and scalable environment.

Desired Skills and Experience

* BSc or higher degree in Computing Science or equivalent desired
* Relevant work experience (10+ yrs) in securing systemsand infrastructure
* Prior experience in penetration testing, vulnerability management, forensics
* Require prior experience in the area of IDS/IPS, firewall config/management
* Experience with high traffic, Internet-facing services
* Ability to understand and integrate business drivers and priorities into design
* Strong problem solving and analytical skills
* Strong communication skills with both product management and engineering
* Familiar with OWASP Top-10
* Relevant certifications (CISSP, GIAC Gold/Platinum, and CISM) a plus

1 Comment

Heartbleed, Internet Security and What it Means to You


For those not in the know, or catching any of the news stories that are popping up today in mainstream media, we are in the midst of dealing with a very serious vulnerability that has been discovered in the foundation of secure data transmission on the internet. While many of the news stories out there are filled with some ridiculous hyperbole, it would be dangerous to understate the criticality of what was discovered.

SSL (Secure Sockets Layer) is a protocol for letting your computer and other systems communicate across the internet with negotiated encryption (so people can’t snoop on your passwords and other sensitive transmitted information), and authentication (so you have a way of knowing that when you’re filling in information at your bank’s website it actually is going to your bank’s website). Anytime you’re at a website with “https” in the URL, or that little lock icon in your address bar, your communications are protected by this protocol and code running in your browser and on the server you’re communicating with works on encrypting and decrypting the information flying through the tubes. The SSL protocol was initially developed by our old friends at Netscape in the early 1990s, and is what makes e-commerce and a good portion of our modern economy and communications possible.

The Heartbleed Bug lets any attacker send a somewhat-carefully crafted message to a web server running this SSL code and get back arbitrary contents of the memory within that server. This is, sadly, not an uncommon type of bug (as anyone who has ever programmed will recognize the horror and commonality of array bounds-checking problems and buffer overflow problem). On a web server, however, some things that get returned from memory when it is poked with this attack include:

  • The web server’s secret key – This is the key that’s used to actually encrypt all traffic. If you are running a secure website and were vulnerable to this bug, in my opinion, you should assume that your key has been compromised and generate a new key and certificate for encrypting future traffic. Fortunately, due to the “authentication” part of the SSL protocol, in order to take advantage of having a server private key and certificate, you’d have to launch a “man in the middle” attack — which takes a bit more work and often involves actually penetrating the network of your victim and/or hijacking internet DNS service for your victim. Still, this is a very bad thing to leak.
  • Sensitive Information – Usernames, passwords, things filled out in forms and submitted to the website by other customers at the time the attack is launched will be present in the server memory in plaintext and can be retrieved. It’s not a bad idea to change your passwords regularly on websites anyways, but this bug might provoke you to go and do it right now
  • Session Cookies – Many secure websites keep track of which users are logged in and which aren’t by sharing a little bit of data with you known as a “cookie.” It’s pretty much a magic number that your browser can present to the website to say “hey it’s me again.” The web server will then look it up in the database to say “oh yeah, you logged in successfully a few hours ago, you’re still good.” This is how you can go to websites like facebook repeatedly and not have to enter your password over and over again. Other users’ session cookies will be present in the server memory in plaintext and can be retrieved by this attack. This is called “sidejacking” and is (in my opinion) the most frightening aspect of this bug. This blog has a more detailed example of using this vulnerability to do a sidejacking, and confirms that this is possible on at least one “fairly popular website”

This bug was disclosed in what we call a “responsible” manner. The researchers that were supposedly first to discover it did not release it to the public, but went directly to the OpenSSL project and, in turn, large stakeholders were notified several weeks ago. It can be assumed that sites like Google, Facebook, Akamai (which is good because they actually terminate a good portion of the web’s SSL — including TripAdvisor’s), and hosting providers like CloudFlare have already repaired the vulnerability before yesterday. Sadly, it appears that the publication of the vulnerability on April 7th was earlier than hoped. Linux distribution providers (Debian, CentOS, Redhat, Ubuntu) who provide the OpenSSL code packages that people like me actually have to get to install on our web servers, were not providing a fix in some cases until late in the evening on the 7th — well after exploit code was in the wild. Furthermore, while I trust the researchers listed as the discoverers of this bug, I can not (nor should anyone) be 100% certain that someone else hadn’t already discovered this problem and has been attacking websites with it for several months stealing private keys and sensitive information and credentials. So while it’s comforting that responsible disclosure and fast action on the part of the people that run the web sites you visit every day (people like me) have potentially mitigated the problem, the consequences of this vulnerability are (as you can see in the list above) far reaching and somewhat frightening.

“So as a regular person, how worried should I be?”This is a common question a lot of people have been asking in the past day or two. I can’t pretend to understand your own risk and paranoia level, but I will attempt to convey how I feel. This is not a reason to stop trusting the little lock icon in your browser or the “https” in the url. Bugs happen, sometimes information is leaked, and then they get fixed. Any damage done by this has already been done and there’s no reason to yank out your ethernet cables and delete your facebook and twitter accounts. What you should do (and should be doing already) are some common sense web security techniques. If there’s a bright side to this bug, it’s that this may increase everyone’s awareness and get people do to the following:

  • Change your passwords: This is a no-brainer. If anyone gets your account information (through this vulnerability or any other means), it’s useless if you change your passwords. I do this every few months.
  • Don’t use the same passwords on multiple sites: This is a common problem. Here at TripAdvisor the only thing your password protects is a bunch of travel reviews. You may think “oh whatever, big deal.” But research (and anecdotal evidence) shows that many people use the same exact password and username on many sites. The same username and password a user uses on TripAdvisor may very well be their gmail password, or the password for their online banking, or facebook or twitter. Websites get hacked all the time (none that I’m responsible for, of course, LOL [yes, I just typed LOL]) — sometimes without the public even knowing about it. So be smart. Even I don’t use a unique password for every website, but I have a set of four or five that I use for different classes of sites (social media password, email password, financial services password, shell login password, etc.).
  • Pick a good password: People have been saying this forever, but I will say it again. Quick story: when I was at UIUC running the campus Email and UNIX shell/file sharing services, we first ran a password cracker against our users’ accounts. The way that these “brute force” attacks work is that an attacker will attempt login using dictionary words, names and other things. The most common password, by far, was actually password. Among the top 5 were also fuckyou, ncc1701, various people’s names (obviously people choose their girlfriend/boyfriend/mother/father’s names for passwords), and in several dozen cases people actually used their usernames as their passwords. These days many websites will prevent you from using a weak password. So don’t be dumb. Pick a good password. It should not be dictionary-word based. Even replacing numbers with letters is easily decoded by brute-force attackers, so don’t think you’re fooling anyone. Don’t use anyone’s name in your password either. And don’t even use a combination of dictionary words, names, and l33t-sp34k numbering. The brute-force password crackers are at least as smart as you and have a lot more time and computing power.
  • So as a website operator or systems engineer what should I do? You should act immediately if you have not already. If you run your own web server, upgrade your OpenSSL package right this goddamn minute. Also, since the library is loaded in memory at service-start time you will need to restart your web server or any other service relying on the flawed library. To be safe, just reboot after you upgrade the package. There also might be code that was built statically-linked to the flawed library. In that case you’ll have to recompile and re-install it. Run common vulnerability scanners like nessus (or other tools available) against everything you have running. If you have a website that’s hosted elsewhere, contact your hosting provider immediately. Make sure they are patched and no longer vulnerable. Also, replace your SSL key and certificate. Some will say that this step is overly paranoid, and your hosting provider might even give you shit for insisting that they generate a new key and certificate for you. As I stated above, while these researchers responsibly disclosed this bug, the possibility that this was out in the wild before can not be dismissed.

    Timeline:

  • December 2011: Bug is introduced into the hearbeat function of the OpenSSL library
  • March 14th 2012: OpenSSL v1.0.1 released into the wild with the bug
  • March? 2014: Bug is discovered by some combination of Neel Mehta at Google Security and Matti Kamunen, Antti Karljalainen and Riku Hietamäki from Codenomicon and reported to the OpenSSL project.
  • >March-April 2014: NCSC-FI and OpenSSL work to notify some subset of stakeholders ahead of time of the vunerability, apparently with a patch and a workaround
  • April 7th 2014: News breaks of the vulnerability and the NCSC-FI team needs to go public with it so the rest of the world can fix their web servers

3 Comments

Follow

Get every new post delivered to your Inbox.

Join 904 other followers

%d bloggers like this: