My Open Source Bookshelf

I have this really bad habit of starring repositories on GitHub and never actually looking at the code. I did this with the Ruby on Rails source long before I decided to take the plunge and learn the internals. For me, it was intimidating. I gave a few pokes at the code, but I often left quickly feeling like it was beyond my comprehension. This barrier I put up for myself spanned farther than Rails into the professional world. I wasn’t a full time developer yet, and while I hadn’t seen any “professional” Rails apps, I feared that when I did, it would be beyond my comprehension as well.

It turns out: I was wrong. My first week as a full time Rails developer I learned that they were tackling many of the issues that I was in my own apps. Their’s was a little more battle tested and complicated, but not unreachable.

This gave me incentive to jump into Rails. As the weeks went by, I learned some really cool things about Rails as well as the tools that were helping me learn: Ruby, Git, and GitHub. Over time, I had these Aha! moments that helped boost my learning. That’s what I’m excited to share with you! Hopefully you can take the things I learned and apply it to your own Rails learning process.

Just a warning, there are some side effects to perusing the Rails source code. You’ll learn more about:

  • Ruby - Reading through thousands of lines of Ruby code, you may be introduced new ruby methods, optimizations, and metaprogramming.
  • Git - While doing Rails research, you’ll probably learn some new Git commands and some options for commands you already know.
  • Open Source - Watching the Rails repository gave me a look into how a project can organize releases, plan for major changes, and deprecate funcionality. It’s knowledge that doesn’t seem to be taught in class or sold in a book, but it’s useful to know, and through GitHub, you can see it firsthand.
  • Magic - Many people say that there’s a log of “Magic” in Rails. They equate the ability do very little to get a project up and running to “Magic”. It turns out, there’s no magic. There’s just a well designed API that does a lot of heavy lifting for users.

The Source of Magic

Designing "magic" is outlined within the Rails Doctrine. Programmer Happiness, Conventions, Exalt beautiful code. It's all there to take away from and bring into our own practices. It just takes writing code with those in mind.

Getting Set Up To Peruse

To start digging in, we first need a copy of the Rails source code. If you’re wanting to peruse the code, then cloning it from Github can suffice, but I recommend following the guides to get the source code set up to test. The tests can allow you to better understand a method.

Once that is set up, you’re ready to start perusing.

Top Level View

When you cd into the Rails directory, you’ll see a folder for each of its modules.

~ $ cd rails/
~/rails $ ls
actioncable       activerecord   CODE_OF_CONDUCT.md
actionmailer      activesupport  CONTRIBUTING.md
actionpack        railties       README.md
actionview        MIT-LICENSE
activejob         rails.gemspec
activemodel       guides

Rails is broken down into modules, each being its own gem. This is partly so that they can be interchangeable. If you want to use another ORM for your Rails application, you can use it instead of Active Record. That also means you can use some of the modules outside of Rails. You can use Active Record for database interactions in a non-Rails app, or use the many methods that people love from Active Support in another project.

A typical Rails request

Here’s a general breakdown of the modules in terms of a typical Rails request:

  1. You have your app running in development with rails s which listens to http://localhost:3000.
  2. You go to http://localhost:3000/posts/12. That request is going to go to your Routes, which will parse the path, /posts/12.
  3. The routes will send the parsed request to the correct controller, here the Posts controller.
  4. The controller will grab an object from the Post model and find the right view.
  5. The controller will pass that back to the routes and out goes the response.

In that request, the Routes and Controller are both handled by the Rails module Action Pack. Action Pack is actually two modules itself: the Routes are handled by Action Dispatch, and the Controller is handled by Action Controller. The models are usually Active Record or sometimes Active Model. The views are handled by Action View. The command line of the app and initialization when you start up the app is handled by Railties. And sprinkled throughout all of these is Active Support.

There are three additional modules: Action Mailer for mail delivery, Active Job for background jobs/tasks, and Action Cable which allows for integrated websockets.

You’ll also notice that there’s a guides folder in the top level of the repository. This is where the repository for guides.rubyonrails.org resides.

Digging into a module

Most Rails people are familiar with Active Record, so I’m going to use it for a lot of module examples. But everything that I’m doing can be done with each of the Rails modules.

lib  test  activerecord.gemspec  CHANGELOG.md  README.rdoc  ...

In the top level of each module, we see the lib folder which contains the module’s code, a test folder, a README and a Gemspec. The README is a great place to start, and it will give you a summary of the module. If you’re already familiar with the module, I guarantee you’ll find a few takeaways.

The Gemspec shows you the gems that the module depends on to get its work done. For instance, Active Record is dependent on Active Support for all of the magical methods and extensions that it provides, Active Model for the model behaviors it provides, and a gem called Arel for generating many of the SQL queries.

  s.add_dependency "activesupport", version
  s.add_dependency "activemodel",   version

  s.add_dependency "arel", "~> 8.0"

Where to start digging in

The first level in lib contains a file with the name of the module, here active_record.rb, and that has a lot of configuration for how Active Record is loaded when it’s started up. If you’ve heard the terms Eagerloading and Autoloading, this is where it is configured initially. There’s also a rails folder which holds the generators for Active Record, think rails g migration; rails g model. Then there’s a folder with the module name, here active_record, and here you’ll see the code that makes up Active Record.

...
associations.rb                coders
attribute_methods.rb           counter_cache.rb
attribute_mutation_tracker.rb  define_callbacks.rb
attribute.rb                   dynamic_matchers.rb
attribute_set                  enum.rb
attribute_set.rb               errors.rb
...

Some of these names you may recognize as Active Record terms: attribute.rb schema.rb relation.rb, others you may not: attribute_mutation_tracker.rb.

API Documentation

The documentation in these files is what makes up the docs on the api.rubyonrails.org website. That means when you clone the Rails source code, you also have the guides as well as the API documentation.

Lots of people wonder why APIdock doesn't have Rails 5. The Rails team is not associated with APIdock, so don't rely on it for up to date info.

Dig into familiar methods

Start perusing the classes with familiar terms. I found new ways of using methods by doing this, and I saw what methods were actually doing by reading their code. For instance, I was surprised to see how find_by was implemented. find_by and where are both pretty similar methods:

Person.find_by(name: "David Heinemeier Hansson")
# => #<Person id:1, name: "David Heinemeier Hansson", ...>

Person.where(name: "David Heinemeier Hansson")
# => #<ActiveRecord::Relation [#<Person id:1, name: "David Heinemeier Hansson", ...>]>

I wanted to see how similar the code was for each of these.

# File activerecord/lib/active_record/relation/finder_methods.rb, line 77
def find_by(arg, *args)
  where(arg, *args).take
rescue RangeError
  nil
end

Looking at the code for find_by, it’s cool to see that find_by is almost like syntactic sugar for where.take.

Ruby’s Introspection Methods

How do you know where find_by lives? It turns out, Ruby has some cool methods to help you out. I found many of these through a blog by Aaron Patterson called I am a puts debugger. He goes into several tactics that he uses when debugging Ruby using simple puts statements, some basic and some more complex.

One of them that I use almost daily is method().source_location.

Post.method(:create).source_location
# => rails/activerecord/lib/active_record/persistence.rb:29

We can use it to see that create is defined in Persistence, and if you look at it, it’s pretty much calling new, then save, and then returning the object.

create: simple ; save: not so simple

method.source_location leads us right to where the create method is defined, but sometimes it’s not so straightforward, and you’ll have to do some digging. If we try to do the same with save, we get led on a little journey. First off, here’s the core of save:

# The core of save lies here
# File activerecord/lib/active_record/persistence.rb, line 124
def save(*args)
  create_or_update(*args)
rescue ActiveRecord::RecordInvalid
  false
end

It’s in Persistence as well. But, calling method(:save).source_location leads us to Suppressor. Why the differing locations? It turns out Save has to do some work to get to where it needs to be:

Post.new.method(:save).source_location
=>  activerecord/lib/active_record/suppressor.rb, 41.

First, in Suppressor, we’re basically checking if we’re suppressing these kinds of records, meaning that we don’t want them created(see an example here). If so, then we won’t save it and just return true.

[40, 44] in activerecord/lib/active_record/suppressor.rb
   40:
   41:     def save(*) # :nodoc:
=> 42:       SuppressorRegistry.suppressed[self.class.name] ? true : super
   43:     end
   44:

We’re not suppressing, so we’re superred to Transaction. This saves the record’s state in case it fails validations, allowing it to revert back.

[305, 311] in rails/activerecord/lib/active_record/transactions.rb
   305:
   306:     def save(*) #:nodoc:
=> 307:       rollback_active_record_state! do
   308:         with_transaction_returning_status { super }
   309:       end
   310:     end
   311:

Then, that supers us over to a module called Dirty that deals with keeping record of whether or not attributes have changed since a save.

[32, 39] in rails/activerecord/lib/active_record/attribute_methods/dirty.rb
   32:
   33:       # Attempts to +save+ the record and clears changed attributes if successful.
   34:       def save(*)
=> 35:         if status = super
   36:           changes_applied
   37:         end
   38:         status
   39:       end

Nothing is going to happen here yet because you see at the very first line, we’re supering to another place, Validations.

[39, 46] in rails/activerecord/lib/active_record/validations.rb
   39:
   40:     # The validation process on save can be skipped by passing <tt>validate: false</tt>.
   41:     # The regular {ActiveRecord::Base#save}[rdoc-ref:Persistence#save] method is replaced
   42:     # with this when the validations module is mixed in, which it is by default.
   43:     def save(options = {})
=> 44:       perform_validations(options) ? super : false
   45:     end
   46:

Here, validations are checked, and because we have no validations, it will pass. That will super us finally to where we are wanting to be, in Persistence.

[122, 129] in rails/activerecord/lib/active_record/persistence.rb
   122:     # Attributes marked as readonly are silently ignored if the record is
   123:     # being updated.
   124:     def save(*args)
=> 125:       create_or_update(*args)
   126:     rescue ActiveRecord::RecordInvalid
   127:       false
   128:     end
   129:

We could have gone directly to the API website to find that save is in Persistence, but I personally like to fill in the gap that was left by the site and method(:method).source_location. Now we understand what save has to go through before it can do what it needs to do.

So many supers

With lots of super’s and having to keep track of arguments, it can be hard to dig deeper with just source_location. There is another method we can use in cases of super called super_method. But, in a situation where there are several supers, I tend to use byebug.

Testing and Exploring

If you haven’t used byebug before, it allows you to set through the code as it’s executing.

byebug
Post.create

I’ll add byebug right before the line that I want to explore. For a quick run through of byebug, check out the Debugging section of the Ruby on Rails guides. To use byebug, we need a minimal Rails setup.

Rails Bug Report Templates

The Rails team provides several bug report templates, and they’re perfect for what we are trying to do. In fact, they’re a great resource for learning how to debug an issue in Rails.

We’ll use the Active Record master template. It starts with an inline bundler and requiring everything that we need:

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"
  gem "rails", github: "rails/rails"
  gem "arel", github: "rails/arel"
  gem "sqlite3"
  gem "byebug"
end

require "active_record"
require "minitest/autorun"
require "logger"

Using GitHub in a Gemfile

Notice that the script uses GitHub for the Rails source. This will default to the master branch. And it's doing the same with Arel. That is because the Rails master branch is usually using a version of Arel that has not been released, so it will use the latest from the Arel repository instead.

The Bundler docs show you several ways to use a git repo in your Gemfile, including how to use a branch or specific commit.

Next, the script does some Active Record configuration and sets up the models we’re testing. Here, we are going to connect to the database, define the schema, and add an Active Record model:

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
    t.string :title
  end
end

class Post < ActiveRecord::Base
end

I like to pass in at least one attribute to the table schema so that I can follow along and see how attributes change throughout the code.

Last, the script provides a minitest setup with an example:

class BugTest < Minitest::Test
  def test_association_stuff
    post = Post.create!
    post.comments << Comment.create!

    assert_equal 1, post.comments.count
    assert_equal 1, Comment.count
    assert_equal post.id, Comment.first.post.id
  end
end

This is a good playground for a barebones Rails App.

Want a console?

For non-testing related scripts, I remove the Minitest parts and run my commands inline. Also, If I want a console to play around in, I can add Pry to the file and call binding.pry.

    36: 
    37: class Comment < ActiveRecord::Base
    38:   belongs_to :post
    39: end
    40: 
 => 41: binding.pry
    42: post = Post.create!
    43: post.comments << Comment.create!
    44: 
    45: assert_equal 1, post.comments.count
    46: assert_equal 1, Comment.count

[1] pry(main)> 

Unexpected Effects

Now, when debugging and looking at the variables passing, you may hit what I call Heisenburg’s Debugging Uncertainty Principle:

It can be troubling to know simultaneously the exact value and correct execution of a variable. - Werner Heisenbug

Looking at a variable’s value can sometimes affect the code it is passing through. This happens often when going through the Active Record query methods. They use lazy loading for database queries, where it will not call the database query until you need it. This behavior allows you to chain query methods together.

The problem happens when you’re in a query method like where, and you want to see the value of a variable, so you reference it in Byebug. Referencing it executes the variable, and parts of the query methods rely on it not executing yet. This means that if you continue, it will take you on a different path than what it would had I not called the variable.

Benefit: Getting Introductions

Running through scripts like these with Byebug can take you on a long but rewarding journey. As you explore the classes you know, you’ll be introduced to the classes that you don’t know, and you will gain a little context to what it does. Over time, you’ll be aquainted with many more of the classes and the vocabulary/jaron that each one has.

Finding Behavior from Tests

Sometimes, it won’t be clear what certain parts of code are doing. In cases where something looks funny, like a line or block of code, I like changing or deleting it and running the tests. Many times, tests fail providing good documentation for what behavior Rails expects from that line.

I did this a few months back when digging into how Active Record creates attribute methods like post.title=. It took me to a method define_method_attribute=. Here, the code is being passed a name for one of Post’s table columns, say “title”, and it’s creating the method for post.title=. It does so by using metaprogramming to create methods:

activerecord/lib/active_record/attribute_methods/write.rb

  def define_method_attribute=(name)
    safe_name = name.unpack("h*".freeze).first # => 479647c656
    ActiveRecord::AttributeMethods::AttrNames.set_name_cache safe_name, name

    generated_attribute_methods.module_eval <<-STR, __FILE__, __LINE__ + 1
      def __temp__#{safe_name}=(value)
        name = ::ActiveRecord::AttributeMethods::AttrNames::ATTR_#{safe_name}
        write_attribute(name, value)
      end
      alias_method #{(name + '=').inspect}, :__temp__#{safe_name}=
      undef_method :__temp__#{safe_name}=
    STR
  end

This method defines the write method by converting the name variable to hexadecimal and saving that as safe_name. Then, it defines the write method with a name of __temp__ plus the safe_name. After that, it uses alias_method to allow you to call the method from title=. Then it undefines the temp method so you can no longer call it.

It was odd to me that it first creates a temp method, aliases it, and immediately undefines it. It’s taking a few extra steps to create a method. I wanted to figure out why, so I rewrote the code into what I thought it should be. I changed the original method name to be “title=” and removed the alias_method and undef_method.

  def define_method_attribute=(name)
    generated_attribute_methods.module_eval <<-STR, __FILE__, __LINE__ + 1
      def #{name}=(value)
        name = ::ActiveRecord::AttributeMethods::AttrNames::ATTR_#{safe_name}
        write_attribute(name, value)
      end
    STR
  end

I ran the tests, and sure enough, several failed. Here’s a look at one failure:

  1) Error:
BasicsTest#test_non_valid_identifier_column_name:
SyntaxError: rails/activerecord/lib/active_record/attribute_methods/write.rb:18: formal argument cannot be a global variable
              def a$b=(value)

  def test_non_valid_identifier_column_name
    weird = Weird.create("a$b" => "value")
    weird.reload
    assert_equal "value", weird.send("a$b")
    assert_equal "value", weird.read_attribute("a$b")
    . . .
  end

It turns out that this code is sidestepping a limitation on ruby method names. Ruby doesn’t allow methods to contain certain characters, but Rails bypasses this by giving the method a temp name, and then uses alias_method to allow us to call it by what we expect. This allows Rails to create methods for table column names that don’t meet Ruby’s method name constraints, like here a$b.

Using Git to Gain Context

In cases like the attribute_method writer, there’s a lot to gain outside of just reading the code. Within Git, there is a history of the Rails codebase taking you all the way back to 2004:

commit db045dbbf60b53dbe013ef25554fd013baf88134
Author: David Heinemeier Hansson <david@loudthinking.com>
Date:   Wed Nov 24 01:04:44 2004 +0000

    Initial

    git-svn-id: http://svn-commit.rubyonrails.org/rails/trunk@4 5ecf4fe2-1ee6-0310-87b1-e25e094e27de

~/rails $ ls
actionmailer  actionpack  activerecord  doc  railties

The commits tell a story of how the codebase has evolved. With all of the info that it stores, you can find bits and pieces of information in Git that add context to the code you’re researching.

git blame | A way to see when changes were introduced

I wanted to see the original commit for the code that introduced the behavior we just looked at, and it led me on a Git journey, full of tricky paths. I started the easiest way, with Git blame. Git blame will give you the latest commit for each line of code in a file.

~/rails $ git blame activerecord/lib/active_record/attribute_methods/write.rb

80e66cc4 (Xavier Noria   2016-08-06 generated_attribute_methods.module_eval <<-STR, __FILE__, __LINE__ + 1
7c70430c (Ryuta Kamizono 2016-08-23   def __temp__#{safe_name}=(value)
7c70430c (Ryuta Kamizono 2016-08-23     name = ::ActiveRecord::AttributeMethods::AttrNames::ATTR_#{safe_name}
7c70430c (Ryuta Kamizono 2016-08-23     write_attribute(name, value)
7c70430c (Ryuta Kamizono 2016-08-23   end
7c70430c (Ryuta Kamizono 2016-08-23   alias_method #{(name + '=').inspect}, :__temp__#{safe_name}=
7c70430c (Ryuta Kamizono 2016-08-23   undef_method :__temp__#{safe_name}=
7c70430c (Ryuta Kamizono 2016-08-23 STR

Git blame gave me a commit from 2016, but in this case, git blame gave me a simple indentation change. A few months back, the Rails team implemented Rubocop to clean up and enforce a style guide on the code, and git blame picked up a commit by Ryuta shifting the indentation.

git blame -w | no whitespace

We want to ignore these types of commits, and git blame provides a -w option to ignore whitespace changes.

~/rails $ git blame -w activerecord/lib/active_record/attribute_methods/write.rb

b785e921 (Aaron Patterson 2013-07-03 generated_attribute_methods.module_eval <<-STR, __FILE__, __LINE__ + 1
b785e921 (Aaron Patterson 2013-07-03   def __temp__#{safe_name}=(value)
b785e921 (Aaron Patterson 2013-07-03     name = ::ActiveRecord::AttributeMethods::AttrNames::ATTR_#{safe_name}
b785e921 (Aaron Patterson 2013-07-03     write_attribute(name, value)
b785e921 (Aaron Patterson 2013-07-03   end
b785e921 (Aaron Patterson 2013-07-03   alias_method #{(name + '=').inspect}, :__temp__#{safe_name}=
b785e921 (Aaron Patterson 2013-07-03   undef_method :__temp__#{safe_name}=
b785e921 (Aaron Patterson 2013-07-03 STR

Now we get one from Aaron Patterson in 2013. I can run git show b785e921 to see his commit. With a typical commit, we’d be done, but there’s more to the story with this commit. Git Blame returns only one commit, and this is a special case where the code existed before, was deleted, then is reverted in this commit by Aaron. This led me to a cool new option with git log.

git log -S | A way to search commits

In cases of multiple commits, git log has an -S option does a better job. git log -S "search query" returns any commits that have made an addition or deletion similar to the the code snippet you pass in. So if we take a line of code from the the code above to search from, we’ll get back commits which have a change that matches the code:

$ git log -S "alias_method #{(name + '=').inspect}"

commit b785e921d186753d905c1d0415b91d0987958028
Author: Aaron Patterson <aaron.patterson@gmail.com>
Date:   Wed Jul 3 14:18:31 2013 -0700

    method transplanting between modules isn't supported on 1.9

There are a few options I like to use with -S that make the output more useful:

  • –patch(-p): The patch option includes the changes that were made in the commit
  • –pickaxe-all: When -S finds a change, it shows only the files that contain the change. With this option, it will show all the changes in that commit. Usually if other files are changed in a commit, those changes are relevant.
  • –reverse: Output the commits from oldest to newest. This shows you the changes as they happened.

This will show me the first commit with that line. I found it, but there’s still work to do. I didn’t have the commit that introduced the functionality, just that specific line. But, it gave me a starting point, and I continued to git blame and git log, until I finally found a commit from Santiago Pastorino in 2011:

$ git log -p --reverse --pickaxe-all -S "alias_method #{(name + '=').inspect}"

commit baa237c974fee8023dd704a4efb418ff0e963de0
Author: Santiago Pastorino <santiago@wyeworks.com>
Date:   Mon Mar 21 21:36:05 2011 -0300

    Allow to read and write AR attributes with non valid identifiers

Along the way, I was able to discover the original purpose of the code, optimizations made to the code, and changes due to bugs. And because we used the --pickaxe-all option, I saw changes made in other files like its counterpart, read.rb that included documentation. You might also notice that the documentation explains everything I’ve been researching, and I should have just started there, but what can you do? Read the documentation probably:

    # But sometimes the database might return columns with
    # characters that are not allowed in normal method names (like
    # 'my_column(omg)'. So to work around this we first define with
    # the __temp__ identifier, and then use alias method to rename
    # it to what we want.

Reblame with GitHub or Vim Fugitive

Another cool way to blame is with a feature called Reblaming. Reblaming was introduced to me by Sean Griffin, and it is essentially checking out the file to the commit before that and running git blame so that you see what it looked like before that commit. Vim Fugitive, a Vim plugin, has reblame as well as GitHub.

Getting even more information with GitHub

GitHub serves as a great extension to Git. Where Git can provide you a commit message, GitHub can provide you a whole conversation. For example, if you’ve migrated an app from Rails 4 to Rails 5, you will have noticed several deprecations that needed to be fixed. One for me was that redirect_to :back was deprecated for redirect_back in controllers.

I wanted to see the reasoning behind the change. When I ran git log -S with part of the deprecation string, I received a commit message from Derek Prior at Thoughtbot discussing that when no referrer is available, redirect_to :back can result in an application error:

commit 13fd5586cef628a71e0e2900820010742a911099
Author: Derek Prior <derekprior@gmail.com>
Date:   Tue Dec 15 20:17:32 2015 -0500

    Add `redirect_back` for safer referrer redirects

    `redirect_to :back` is a somewhat common pattern in Rails apps, but it
    is not completely safe. There are a number of circumstances where HTTP
    referrer information is not available on the request. This happens often
    with bot traffic and occasionally to user traffic depending on browser
    security settings.

    When there is no referrer available on the request, `redirect_to :back`
    will raise `ActionController::RedirectBackError`, usually resulting in
    an application error.

    `redirect_back` takes a required `fallback_location` keyword argument
    that specifies the redirect when the referrer information is not
    available.  This prevents 500 errors caused by
    `ActionController::RedirectBackError`.

Derek provides a great commit message on why this change is important, so I already have a lot of context to the problem. GitHub had even more information to offer.

Search by the commit hash

On GitHub, I can search by the commit hash in the Rails repository to find information related to that commit.

From the search results, we see both a commit and an issue. The issue is really a Pull Request, and GitHub puts the two groups together. In the Pull Request, there is a conversation between Derek and members of the Rails team discussing real life scenarios where you might see the issue Derek’s trying to resolve.

Learning from others

In the conversation, Derek provides a few examples, and he also includes a link to a security site about unsafe redirect practices. As someone wanting to learn, this information is gold. One of the hardest things starting something new is knowing what you don’t know. How do you know if you’re asking the right questions and learning what you need to learn? Before reading this pull request, I did not know that redirect_to :back could be a security or application issue. My experience and handful of Rails projects under my belt didn’t lend to running into this issue along with many of the issues with Rails, but here I was getting to learn from a paid Rails consultant through his pull request.

Digging for Gold in PR’s and Issues

This is what is great about pull requests and issues in Rails. You may not see the issues present in Rails in your day-to-day work, but GitHub allows you to learn about them, reproduce them on your own fork of Rails, and see how other people are using Rails. A lot of pull requests and issues also have little nuggets on knowledge hidden in them. When someone submits a PR, it’s on the submitter to sell the PR to the maintainers on why it’s important, and we can piggy back on that knowledge.

Sometimes it’s other contributors jumping in to explain the importance of the change. Here’s a pull request that changes the concatenation of multiline strings from using + or << to \.

commit b70fc698e157f2a768ba42efac08c08f4786b01c
Author: Akira Matsuda <ronnie@dio.jp>
Date: Thu, 12 Jan 2017 17:39:16 +0900
Reduce string objects by using \ instead of + or << for  concatenating strings

(I personally prefer writing one string in one line no matter how long it is, though)

There’s not a much info other than those are 3 ways to concat a string, so a user on GitHub commented on the commit asking “what’s the benefit of using \ over + or «?”i Another user jumped in, explaining that the change to \ saves in allocations at runtime:

”+ and « are operations performed at the runtime of the program. When you concat strings with spaces or \ and a newline those are interpreted as one string during the program parsing. This saves the runtime allocations.” - gsamokovarov

GitHub extends the value of Git by incorporating community and conversation. Rails and Ruby have a strong community of teachers and learners that are willing to share what they know and tackle problems together.

How Contributing to Rails works

GitHub is also the place where changes are introduced, discussed, and reviewed long before they are merged into the repository, sometimes months before. This makes the information you gain from reading Pull Requests and Issues beneficial.

It all starts with a Pull Request

Changes to Rails start when a pull request is made. Someone has made changes to their fork of Rails, commits it, and sends the pull request to the Rails repository. There may even be an issue linked to in the PR which gives you additional background to what the PR is solving.

The changes are discussed

You can follow along with the conversation as maintainers and contributors discuss the pull request and learn about different ways to solve the problem. The code in the PR may be adjusted to meet the change requests, or scrapped altogether because the contributor wants to take a different route.

More changes are made

The pull request may become several commits as tiny additions/deletions are made, allowing you to see the changes in chunks and giving you insight into how the committer went about solving the final problem. Then, as the maintainers approve it, you may lose some of that contextual information as the commits are rebased into one commit.

The Pull Request is accepted

When the pull request is merged, the commit finally becomes a part of Rails. This is where Git starts, and you already have a great deal of information more than you would get just by reading the commit.

Read What’s Familiar to You

When looking at pull requests and issues, you can look at what’s familiar to you by taking advantage of the labels in GitHub. In the Issues/PR trackers, you can filter by modules. I also like to search for a method’s name to see the issues and PR’s related to it. These will help give an understanding to the way the code is structured.

If you want to keep up with the Rails releases and see some of the changes that will be going into them, you can use the milestones in the issues tracker. Here, you’ll find open PR’s and issues that are planned to be completed for the next releases. It’s just fun to bring up in conversations and look like a smartie, “Well, a little octocat told me that system testing will be added to Rails 5.1 :smugface.

Reading Along Gets Easier, Over Time

There’s a lot of information to be gained from GitHub, but at first, it can be hard to follow pull requests and commits. I read through many PRs and issues initially with glazed eyes, but from time to time, I’d find a takeway. The more I did this, and the more I dug into the code, the more takeaways I would find.

Learning Quickly By Tackling Issues

The most effective way of learning I have found is trying to tackle issues. Once you’re more comfortable with reading through the issues, find an issue that you’re comfortable with, preferably with a reproduction script and an attached Pull Request. Before reading the PR, try to tackle the issue yourself. Run through the script and try to find what’s causing the issue. If you figure out the issue and how to fix it, compare what you found to the PR. There are a number of ways to fix problems, and your approaches will probably differ. Even if you get stuck and can’t figure out how to fix the issue, the knowledge you gained from trying will make reading the pull request easier and more valuable.

Or Reproducing an issue

Many times an issue is opened explaining a problem in Rails, but it may not have a reproduction script. Adding a reproduction script is a great way to become comfortable with the different modules of Rails and it is very helpful to the Rails team. Trying to reproduce an issue with a bug template report gives you a way to play with methods that you may not use every day. If you can reproduce it with a script, comment on the issue saying that you were able to reproduce it and share the script. An easy way to get a shareable copy of the script is with the Gist gem.

Or Find where the issue was introduce (My favorite tool: Git Bisect)

After an issue is reproducable, the next step is seeing if it was added in a version of Rails. Sometimes regressions happen, and finding the commit that added the behavior is very handy to fixing the issue. If you want a good rundown of Git Bisect, check out this part of Eileen’s Contributing to Rails talk. Here are the steps I usually take to find the commit that introduced a behavior:

  1. Run the script on the master branch - Let’s say it fails
  2. Run the script on a previous release, say 4.2.8(You can checkout these branches like so: git checkout v4.2.8) - Let’s say it passes. We now know that the behavior was introduced between the two. Let’s use the handy dandy git bisect.
  3. Run git bisect start - Git Bisect does a binary search to find which commit introduced the behavior. Here we are starting the bisect, and we first have to label the branches.
  4. Run git bisect bad master - We are telling Git Bisect that the behavior is on master.
  5. Run git bisect good v4.2.8 - We are telling Git Bisect that the behavior is not on v4.2.8. This will take you to the middle commit between the two branches.
  6. Run the test script and see the results - If the script passes, run git bisect good. If it fails, run git bisect bad. Now Git Bisect will find another midpoint to test.
  7. Redo step 6 until it finds the commit.

NOTE:There’s a tricky area between Rails 5 and Rails 4.2 where Arel will give you problems with bundle install. It’s because the Rails repo was using a release of Arel that can’t be found on RubyGems. The way to solve this problem is:

  • Run git log on the Rails repo to see the date for the current commit. I’m guessing it will be somewhere around December 2015, give or take some months.
  • Clone Arel locally. Run git log before=2015/12/01, replacing the date with the date from the Rails commit. Take note of the first commit that Arel gives you, and run git checkout <commit-hash>, adding the commit hash there.
  • In the reproduction script, add gem "arel", path: "path-to-your-arel" under the gemfile section. It should now work. Once Rails gets to a commit that uses a released version of Arel, you can comment this out. Don’t delete it, I bet you’ll need it again.

Where to Start

If I’ve intrigued you to peruse the Rails source code, here are some initial steps you can take:

  1. Tweet me to let me know!
  2. Read the Ruby on Rails guides. Even if you’ve read parts of it before, give it a full skim.
  3. Read the README’s to each module.
  4. Look into some of the methods you use often, using the Ruby methods I outlined.
  5. Try to understand a full feature. I did this recently with Cookies.
    • Read its code and read the code of other areas it depends on. For instance, cookies depend no MessageVerifier and MessageEncryptor from Active Support.
    • If the feature is small enough, read all of the logs from git log --patch --reverse
    • Search for it in the Rails issue tracker on GitHub and read open and closed PR’s and issues.

It’s not all glamorous, but it is worthwhile.

Additional Resources

This learning process would not have been so easy without the helpful resources provided by other people in our community. Here are several of them that I’ve found incredibly useful throughout my learning process.

  • I am a puts debugger - Aaron Patterson: He has a lot of easy but useful debugging tips for a variety of different problems.
  • Ruby Debugging Magic Cheat Sheet - Richard Schneeman: In a similar vein, Schneems has some additionally good debugging takeaways.
  • Demystifying Contributing to Rails - Eileen Uchitelle: It took me several days to get through this talk because she goes over so many helpful things, and I kept getting distracting by wanting to try them. This talk was so helpful, that I somewhat purposefully wrote it to be a prequel to that talk.
  • Eileen’s System Test Pull Request: Check out Eileen’s pull request for adding system testing to rails. Reading through the conversation on her Pull Request and reading through each commit from the beginning gives you a good look into how a Rails feature is added. The commits are all still there instead of being rebased, so you can see her thinking process as it’s built out. Then watch her talk from RailsConf right before mine where she discusses what it was like to add a feature to Rails, the good and the bad.
  • The Bike Shed: If you’re looking for a good podcast, I recommend The BikeShed. Sean Griffin, one of the hosts, is one of the maintainers for Active Record. Derek Prior, who I mentioned earlier, is also a host and he contributes to Rails often. In many episodes, they talk about issues in Rails or tough work problems with Rails, and the feel of the podcast is more like listening to two coworkers at lunch, rather than listening to a panel like most other podcasts.
  • Crafting Rails 4 Applications - José Valim: In the book, José starts right away with building a Renderer plugin and digs into the internals of Rails to explain how Rendering works within Rails. If you can’t take my word for it, you can take Rafael França’s of the Rails team who recommended it to me.
  • Rails Boot Process - Xavier Noria: Xavier talks about what goes on in the Rails boot process, and it’s a good introduction on what all Rails does when it starts up.

Keeping Up With Rails (Without Wearing Yourself Out)

Watching the Rails repository can be exhausting. Here are a few places to find whats going on in Rails without having to read every issue and pull request:

  • This Week in Rails: This is a newsletter written weekly by the Rails team consisting of interesting commits, pull requests and more from Rails.
  • Release Notes: The Release notes are a great way to stay up to date with the important changes that happen. They link to the pull requests for the changes, so you can do some extra digging if you want.

Happy Digging

I hope you found some good takeaways, and I hope I’ve inspired you to dig into Rails or your favorite open source project! If you liked the information that I provided you, and would like to read more in the future, you can follow me on Twitter for updates on new blogs. Specifically, here are a few blogs that I have in mind:

  • The Journey of Active Record’s #create - This blog will take one of Active Record’s most commonly used methods and dig through the internals of everything it does.
  • Active Record Associations - “Should this be a :has_many or :belongs_to?” I don’t know how many times, I asked myself that. This blog will break down the purpose of each association and what each are doing.
  • Action Pack - Action Pack does a lot. It handles routing, controllers, and a lot of the security inside of Rails. This blog will give a good overview of the interesting things that Action Pack is doing.

A Big <h1> Thank You to Sean Griffin

Sean gave me a lot of insight and motivation while writing my talk, and he continues to help me learn. Thank you Sean!