Perusing the Rails Source Code
May 6, 2017 • Here are some steps you can take to explore the inner workings of Rails and gain context on its design. Understanding how Rails works will allow you to write better Rails applications and better Ruby code.
My Open Source Bookshelf
I have this really bad habit of starring repositories on GitHub and never actually looking at the code. I did this with the Ruby on Rails source long before I decided to take the plunge and learn the internals. For me, it was intimidating. I gave a few pokes at the code, but I often left quickly feeling like it was beyond my comprehension. This barrier I put up for myself spanned further than Rails, all the way into the professional world. I wasn’t a full time developer yet, and while I had not seen any “professional” Rails apps, I feared that when I did, it would be beyond my comprehension.
It turns out: I was wrong. In my first week as a full time Rails developer, I learned that they were tackling many of the issues that I had in my own apps. Their issues were a little more battle tested and complicated, but not unreachable.
This gave me incentive to jump into Rails. As the weeks went by, I learned some really cool things about Rails and the tools that helped me learn: Ruby, Git, and GitHub. Over time, I had these Aha! moments that helped boost my learning. That’s what I’m excited to share with you! Hopefully you can take the things I learned and apply it to your own Rails learning process.
Just a warning, there are some side effects of perusing the Rails source code. You’ll learn more about:
- Ruby - Reading through thousands of lines of Ruby code, you may be introduced new ruby methods, optimizations, and metaprogramming.
- Git - While doing Rails research, you’ll probably learn some new Git commands and some options for commands you already know.
- Open Source - Watching the Rails repository gave me a look into how a project can organize releases, plan for major changes, and deprecate funcionality. It’s knowledge that doesn’t seem to be taught in class or sold in a book, but it’s useful to know, and through GitHub, you can see it firsthand.
- Magic - Many people say that there’s a log of “Magic” in Rails. They equate the ability do very little to get a project up and running to “Magic”. It turns out, there’s no magic. There’s just a well designed API that does a lot of heavy lifting for users.
The Source of Magic
Getting Set Up To Peruse
To start digging in, we first need a copy of the Rails source code. If you’re wanting to peruse the code, then cloning it from Github can suffice, but I recommend following the guides to get the source code set up to test. The tests can allow you to better understand a method.
Once that is set up, you’re ready to start perusing.
Top Level View
cd into the Rails directory, you’ll see a folder for each of its modules.
Rails is broken down into modules, each being its own gem. This is partly so that they can be interchangeable. If you want to use another ORM for your Rails application, you can use it instead of Active Record. That also means you can use some of the modules outside of Rails. You can use Active Record for database interactions in a non-Rails app, or use the many methods that people love from Active Support in another project.
A typical Rails request
Here’s a general breakdown of the modules in terms of a typical Rails request:
- You have your app running in development with
rails swhich listens to
- You go to
http://localhost:3000/posts/12. That request is going to go to your Routes, which will parse the path,
- The routes will send the parsed request to the correct controller, here the Posts controller.
- The controller will grab an object from the Post model and find the right view.
- The controller will pass that back to the routes and out goes the response.
In that request, the Routes and Controller are both handled by the Rails module Action Pack. Action Pack is actually two modules itself: the Routes are handled by Action Dispatch, and the Controller is handled by Action Controller. The models are usually Active Record or sometimes Active Model. The views are handled by Action View. The command line of the app and initialization when you start up the app is handled by Railties. And sprinkled throughout all of these is Active Support.
Digging into a module
Most Rails people are familiar with Active Record, so I’m going to use it for a lot of module examples. But everything that I’m doing can be done with each of the Rails modules.
In the top level of each module, we see the
lib folder which contains the module’s code, a
test folder, a README and a Gemspec. The README is a great place to start, and it will give you a summary of the module. If you’re already familiar with the module, I guarantee you’ll find a few takeaways.
The Gemspec shows you the gems that the module depends on to get its work done. For instance, Active Record is dependent on Active Support for all of the magical methods and extensions that it provides, Active Model for the model behaviors it provides, and a gem called Arel for generating many of the SQL queries.
Where to start digging in
The first level in lib contains a file with the name of the module, here
active_record.rb, and that has a lot of configuration for how Active Record is loaded when it’s started up. If you’ve heard the terms Eagerloading and Autoloading, this is where it is configured initially. There’s also a rails folder which holds the generators for Active Record, think
rails g migration; rails g model. Then there’s a folder with the module name, here
active_record, and here you’ll see the code that makes up Active Record.
Some of these names you may recognize as Active Record terms:
attribute.rb schema.rb relation.rb, others you may not:
Lots of people wonder why APIdock doesn't have Rails 5. The Rails team is not associated with APIdock, so don't rely on it for up to date info.
Dig into familiar methods
Start perusing the classes with familiar terms. I found new ways of using methods by doing this, and I saw what methods were actually doing by reading their code. For instance, I was surprised to see how
find_by was implemented.
where are both pretty similar methods:
I wanted to see how similar the code was for each of these.
Looking at the code for
find_by, it’s cool to see that
find_by is almost like syntactic sugar for
Ruby’s Introspection Methods
How do you know where
find_by lives? It turns out, Ruby has some cool methods to help you out. I found many of these through a blog by Aaron Patterson called I am a puts debugger. He goes into several tactics that he uses when debugging Ruby using simple
puts statements, some basic and some more complex.
One of them that I use almost daily is
We can use it to see that
create is defined in Persistence, and if you look at it, it’s pretty much calling
save, and then returning the object.
create: simple ; save: not so simple
method.source_location leads us right to where the
create method is defined, but sometimes it’s not so straightforward, and you’ll have to do some digging. If we try to do the same with
save, we get led on a little journey. First off, here’s the core of
It’s in Persistence as well. But, calling
method(:save).source_location leads us to Suppressor. Why the differing locations? It turns out Save has to do some work to get to where it needs to be:
First, in Suppressor, we’re basically checking if we’re suppressing these kinds of records, meaning that we don’t want them created(see an example here). If so, then we won’t save it and just return true.
We’re not suppressing, so we’re
superred to Transaction. This saves the record’s state in case it fails validations, allowing it to revert back.
supers us over to a module called Dirty that deals with keeping record of whether or not attributes have changed since a save.
Nothing is going to happen here yet because you see at the very first line, we’re
supering to another place, Validations.
Here, validations are checked, and because we have no validations, it will pass. That will
super us finally to where we are wanting to be, in Persistence.
We could have gone directly to the API website to find that
save is in Persistence, but I personally like to fill in the gap that was left by the site and
method(:method).source_location. Now we understand what
save has to go through before it can do what it needs to do.
So many supers
With lots of
super’s and having to keep track of arguments, it can be hard to dig deeper with just
source_location. There is another method we can use in cases of
super_method. But, in a situation where there are several
supers, I tend to use byebug.
Testing and Exploring
If you haven’t used byebug before, it allows you to set through the code as it’s executing.
byebug right before the line that I want to explore. For a quick run through of byebug, check out the Debugging section of the Ruby on Rails guides. To use
byebug, we need a minimal Rails setup.
Rails Bug Report Templates
The Rails team provides several bug report templates, and they’re perfect for what we are trying to do. In fact, they’re a great resource for learning how to debug an issue in Rails.
We’ll use the Active Record master template. It starts with an inline bundler and requiring everything that we need:
Using GitHub in a Gemfile
The Bundler docs show you several ways to use a git repo in your Gemfile, including how to use a branch or specific commit.
Next, the script does some Active Record configuration and sets up the models we’re testing. Here, we are going to connect to the database, define the schema, and add an Active Record model:
I like to pass in at least one attribute to the table schema so that I can follow along and see how attributes change throughout the code.
Last, the script provides a minitest setup with an example:
This is a good playground for a barebones Rails App.
Want a console?
For non-testing related scripts, I remove the Minitest parts and run my commands inline. Also, If I want a console to play around in, I can add Pry to the file and call
Now, when debugging and looking at the variables passing, you may hit what I call Heisenburg’s Debugging Uncertainty Principle:
It can be troubling to know simultaneously the exact value and correct execution of a variable. - Werner Heisenbug
Looking at a variable’s value can sometimes affect the code it is passing through. This happens often when going through the Active Record query methods. They use lazy loading for database queries, where it will not call the database query until you need it. This behavior allows you to chain query methods together.
The problem happens when you’re in a query method like
where, and you want to see the value of a variable, so you reference it in Byebug. Referencing it executes the variable, and parts of the query methods rely on it not executing yet. This means that if you continue, it will take you on a different path than what it would had I not called the variable.
Benefit: Getting Introductions
Running through scripts like these with Byebug can take you on a long but rewarding journey. As you explore the classes you know, you’ll be introduced to the classes that you don’t know, and you will gain a little context to what it does. Over time, you’ll be aquainted with many more of the classes and the vocabulary/jaron that each one has.
Finding Behavior from Tests
Sometimes, it won’t be clear what certain parts of code are doing. In cases where something looks funny, like a line or block of code, I like changing or deleting it and running the tests. Many times, tests fail providing good documentation for what behavior Rails expects from that line.
I did this a few months back when digging into how Active Record creates attribute methods like
post.title=. It took me to a method
define_method_attribute=. Here, the code is being passed a name for one of Post’s table columns, say “title”, and it’s creating the method for
post.title=. It does so by using metaprogramming to create methods:
This method defines the write method by converting the name variable to hexadecimal and saving that as
safe_name. Then, it defines the write method with a name of
__temp__ plus the
safe_name. After that, it uses
alias_method to allow you to call the method from
title=. Then it undefines the temp method so you can no longer call it.
It was odd to me that it first creates a temp method, aliases it, and immediately undefines it. It’s taking a few extra steps to create a method. I wanted to figure out why, so I rewrote the code into what I thought it should be. I changed the original method name to be “title=” and removed the
I ran the tests, and sure enough, several failed. Here’s a look at one failure:
It turns out that this code is sidestepping a limitation on ruby method names. Ruby doesn’t allow methods to contain certain characters, but Rails bypasses this by giving the method a temp name, and then uses
alias_method to allow us to call it by what we expect. This allows Rails to create methods for table column names that don’t meet Ruby’s method name constraints, like here
Using Git to Gain Context
In cases like the attribute_method writer, there’s a lot to gain outside of just reading the code. Within Git, there is a history of the Rails codebase taking you all the way back to 2004:
The commits tell a story of how the codebase has evolved. With all of the info that it stores, you can find bits and pieces of information in Git that add context to the code you’re researching.
git blame | A way to see when changes were introduced
I wanted to see the original commit for the code that introduced the behavior we just looked at, and it led me on a Git journey, full of tricky paths. I started the easiest way, with Git blame. Git blame will give you the latest commit for each line of code in a file.
Git blame gave me a commit from 2016, but in this case,
git blame gave me a simple indentation change. A few months back, the Rails team implemented Rubocop to clean up and enforce a style guide on the code, and
git blame picked up a commit by Ryuta shifting the indentation.
git blame -w | no whitespace
We want to ignore these types of commits, and
git blame provides a
-w option to ignore whitespace changes.
Now we get one from Aaron Patterson in 2013. I can run
git show b785e921 to see his commit. With a typical commit, we’d be done, but there’s more to the story with this commit. Git Blame returns only one commit, and this is a special case where the code existed before, was deleted, then is reverted in this commit by Aaron. This led me to a cool new option with
git log -S | A way to search commits
In cases of multiple commits,
git log has an
-S option does a better job.
git log -S "search query" returns any commits that have made an addition or deletion similar to the the code snippet you pass in. So if we take a line of code from the the code above to search from, we’ll get back commits which have a change that matches the code:
There are a few options I like to use with
-S that make the output more useful:
- –patch(-p): The patch option includes the changes that were made in the commit
- –pickaxe-all: When -S finds a change, it shows only the files that contain the change. With this option, it will show all the changes in that commit. Usually if other files are changed in a commit, those changes are relevant.
- –reverse: Output the commits from oldest to newest. This shows you the changes as they happened.
This will show me the first commit with that line. I found it, but there’s still work to do. I didn’t have the commit that introduced the functionality, just that specific line. But, it gave me a starting point, and I continued to git blame and git log, until I finally found a commit from Santiago Pastorino in 2011:
Along the way, I was able to discover the original purpose of the code, optimizations made to the code, and changes due to bugs. And because we used the
--pickaxe-all option, I saw changes made in other files like its counterpart, read.rb that included documentation. You might also notice that the documentation explains everything I’ve been researching, and I should have just started there, but what can you do? Read the documentation probably:
Reblame with GitHub or Vim Fugitive
Getting even more information with GitHub
GitHub serves as a great extension to Git. Where Git can provide you a commit message, GitHub can provide you a whole conversation. For example, if you’ve migrated an app from Rails 4 to Rails 5, you will have noticed several deprecations that needed to be fixed. One for me was that
redirect_to :back was deprecated for
redirect_back in controllers.
I wanted to see the reasoning behind the change. When I ran
git log -S with part of the deprecation string, I received a commit message from Derek Prior at Thoughtbot discussing that when no referrer is available,
redirect_to :back can result in an application error:
Derek provides a great commit message on why this change is important, so I already have a lot of context to the problem. GitHub had even more information to offer.
Search by the commit hash
On GitHub, I can search by the commit hash in the Rails repository to find information related to that commit.
From the search results, we see both a commit and an issue. The issue is really a Pull Request, and GitHub puts the two groups together. In the Pull Request, there is a conversation between Derek and members of the Rails team discussing real life scenarios where you might see the issue Derek’s trying to resolve.
Learning from others
In the conversation, Derek provides a few examples, and he also includes a link to a security site about unsafe redirect practices. As someone wanting to learn, this information is gold. One of the hardest things starting something new is knowing what you don’t know. How do you know if you’re asking the right questions and learning what you need to learn? Before reading this pull request, I did not know that
redirect_to :back could be a security or application issue. My experience and handful of Rails projects under my belt didn’t lend to running into this issue along with many of the issues with Rails, but here I was getting to learn from a paid Rails consultant through his pull request.
Digging for Gold in PR’s and Issues
This is what is great about pull requests and issues in Rails. You may not see the issues present in Rails in your day-to-day work, but GitHub allows you to learn about them, reproduce them on your own fork of Rails, and see how other people are using Rails. A lot of pull requests and issues also have little nuggets on knowledge hidden in them. When someone submits a PR, it’s on the submitter to sell the PR to the maintainers on why it’s important, and we can piggy back on that knowledge.
Sometimes it’s other contributors jumping in to explain the importance of the change. Here’s a pull request that changes the concatenation of multiline strings from using
There’s not a much info other than those are 3 ways to concat a string, so a user on GitHub commented on the commit asking “what’s the benefit of using \ over + or «?”i Another user jumped in, explaining that the change to
\ saves in allocations at runtime:
”+ and « are operations performed at the runtime of the program. When you concat strings with spaces or \ and a newline those are interpreted as one string during the program parsing. This saves the runtime allocations.” - gsamokovarov
GitHub extends the value of Git by incorporating community and conversation. Rails and Ruby have a strong community of teachers and learners that are willing to share what they know and tackle problems together.
How Contributing to Rails works
GitHub is also the place where changes are introduced, discussed, and reviewed long before they are merged into the repository, sometimes months before. This makes the information you gain from reading Pull Requests and Issues beneficial.
It all starts with a Pull Request
Changes to Rails start when a pull request is made. Someone has made changes to their fork of Rails, commits it, and sends the pull request to the Rails repository. There may even be an issue linked to in the PR which gives you additional background to what the PR is solving.
The changes are discussed
You can follow along with the conversation as maintainers and contributors discuss the pull request and learn about different ways to solve the problem. The code in the PR may be adjusted to meet the change requests, or scrapped altogether because the contributor wants to take a different route.
More changes are made
The pull request may become several commits as tiny additions/deletions are made, allowing you to see the changes in chunks and giving you insight into how the committer went about solving the final problem. Then, as the maintainers approve it, you may lose some of that contextual information as the commits are rebased into one commit.
The Pull Request is accepted
When the pull request is merged, the commit finally becomes a part of Rails. This is where Git starts, and you already have a great deal of information more than you would get just by reading the commit.
Read What’s Familiar to You
When looking at pull requests and issues, you can look at what’s familiar to you by taking advantage of the labels in GitHub. In the Issues/PR trackers, you can filter by modules. I also like to search for a method’s name to see the issues and PR’s related to it. These will help give an understanding to the way the code is structured.
If you want to keep up with the Rails releases and see some of the changes that will be going into them, you can use the milestones in the issues tracker. Here, you’ll find open PR’s and issues that are planned to be completed for the next releases. It’s just fun to bring up in conversations and look like a smartie, “Well, a little octocat told me that system testing will be added to Rails 5.1 :smugface“.
Reading Along Gets Easier, Over Time
There’s a lot of information to be gained from GitHub, but at first, it can be hard to follow pull requests and commits. I read through many PRs and issues initially with glazed eyes, but from time to time, I’d find a takeway. The more I did this, and the more I dug into the code, the more takeaways I would find.
Learning Quickly By Tackling Issues
The most effective way of learning I have found is trying to tackle issues. Once you’re more comfortable with reading through the issues, find an issue that you’re comfortable with, preferably with a reproduction script and an attached Pull Request. Before reading the PR, try to tackle the issue yourself. Run through the script and try to find what’s causing the issue. If you figure out the issue and how to fix it, compare what you found to the PR. There are a number of ways to fix problems, and your approaches will probably differ. Even if you get stuck and can’t figure out how to fix the issue, the knowledge you gained from trying will make reading the pull request easier and more valuable.
Or Reproducing an issue
Many times an issue is opened explaining a problem in Rails, but it may not have a reproduction script. Adding a reproduction script is a great way to become comfortable with the different modules of Rails and it is very helpful to the Rails team. Trying to reproduce an issue with a bug template report gives you a way to play with methods that you may not use every day. If you can reproduce it with a script, comment on the issue saying that you were able to reproduce it and share the script. An easy way to get a shareable copy of the script is with the Gist gem.
Or Find where the issue was introduce (My favorite tool: Git Bisect)
After an issue is reproducable, the next step is seeing if it was added in a version of Rails. Sometimes regressions happen, and finding the commit that added the behavior is very handy to fixing the issue. If you want a good rundown of Git Bisect, check out this part of Eileen’s Contributing to Rails talk. Here are the steps I usually take to find the commit that introduced a behavior:
- Run the script on the master branch - Let’s say it fails
- Run the script on a previous release, say 4.2.8(You can checkout these branches like so:
git checkout v4.2.8) - Let’s say it passes. We now know that the behavior was introduced between the two. Let’s use the handy dandy
git bisect start- Git Bisect does a binary search to find which commit introduced the behavior. Here we are starting the bisect, and we first have to label the branches.
git bisect bad master- We are telling Git Bisect that the behavior is on master.
git bisect good v4.2.8- We are telling Git Bisect that the behavior is not on v4.2.8. This will take you to the middle commit between the two branches.
- Run the test script and see the results - If the script passes, run
git bisect good. If it fails, run
git bisect bad. Now Git Bisect will find another midpoint to test.
- Redo step 6 until it finds the commit.
NOTE:There’s a tricky area between Rails 5 and Rails 4.2 where Arel will give you problems with
bundle install. It’s because the Rails repo was using a release of Arel that can’t be found on RubyGems. The way to solve this problem is:
git logon the Rails repo to see the date for the current commit. I’m guessing it will be somewhere around December 2015, give or take some months.
- Clone Arel locally. Run
git log before=2015/12/01, replacing the date with the date from the Rails commit. Take note of the first commit that Arel gives you, and run
git checkout <commit-hash>, adding the commit hash there.
- In the reproduction script, add
gem "arel", path: "path-to-your-arel"under the gemfile section. It should now work. Once Rails gets to a commit that uses a released version of Arel, you can comment this out. Don’t delete it, I bet you’ll need it again.
Where to Start
If I’ve intrigued you to peruse the Rails source code, here are some initial steps you can take:
- Tweet me to let me know!
- Read the Ruby on Rails guides. Even if you’ve read parts of it before, give it a full skim.
- Read the README’s to each module.
- Look into some of the methods you use often, using the Ruby methods I outlined.
- Try to understand a full feature. I did this recently with Cookies.
- Read its code and read the code of other areas it depends on. For instance, cookies depend no MessageVerifier and MessageEncryptor from Active Support.
- If the feature is small enough, read all of the logs from
git log --patch --reverse
- Search for it in the Rails issue tracker on GitHub and read open and closed PR’s and issues.
It’s not all glamorous, but it is worthwhile.
This learning process would not have been so easy without the helpful resources provided by other people in our community. Here are several of them that I’ve found incredibly useful throughout my learning process.
- I am a puts debugger - Aaron Patterson: He has a lot of easy but useful debugging tips for a variety of different problems.
- Ruby Debugging Magic Cheat Sheet - Richard Schneeman: In a similar vein, Schneems has some additionally good debugging takeaways.
- Demystifying Contributing to Rails - Eileen Uchitelle: It took me several days to get through this talk because she goes over so many helpful things, and I kept getting distracting by wanting to try them. This talk was so helpful, that I somewhat purposefully wrote it to be a prequel to that talk.
- Eileen’s System Test Pull Request: Check out Eileen’s pull request for adding system testing to rails. Reading through the conversation on her Pull Request and reading through each commit from the beginning gives you a good look into how a Rails feature is added. The commits are all still there instead of being rebased, so you can see her thinking process as it’s built out. Then watch her talk from RailsConf right before mine where she discusses what it was like to add a feature to Rails, the good and the bad.
- The Bike Shed: If you’re looking for a good podcast, I recommend The BikeShed. Sean Griffin, one of the hosts, is one of the maintainers for Active Record. Derek Prior, who I mentioned earlier, is also a host and he contributes to Rails often. In many episodes, they talk about issues in Rails or tough work problems with Rails, and the feel of the podcast is more like listening to two coworkers at lunch, rather than listening to a panel like most other podcasts.
- Crafting Rails 4 Applications - José Valim: In the book, José starts right away with building a Renderer plugin and digs into the internals of Rails to explain how Rendering works within Rails. If you can’t take my word for it, you can take Rafael França’s of the Rails team who recommended it to me.
- Rails Boot Process - Xavier Noria: Xavier talks about what goes on in the Rails boot process, and it’s a good introduction on what all Rails does when it starts up.
Keeping Up With Rails (Without Wearing Yourself Out)
Watching the Rails repository can be exhausting. Here are a few places to find whats going on in Rails without having to read every issue and pull request:
- This Week in Rails: This is a newsletter written weekly by the Rails team consisting of interesting commits, pull requests and more from Rails.
- Release Notes: The Release notes are a great way to stay up to date with the important changes that happen. They link to the pull requests for the changes, so you can do some extra digging if you want.
I hope you found some good takeaways, and I hope I’ve inspired you to dig into Rails or your favorite open source project! If you liked the information that I provided you, and would like to read more in the future, you can follow me on Twitter for updates on new blogs. Specifically, here are a few blogs that I have in mind:
- The Journey of Active Record’s #create - This blog will take one of Active Record’s most commonly used methods and dig through the internals of everything it does.
- Active Record Associations - “Should this be a :has_many or :belongs_to?” I don’t know how many times, I asked myself that. This blog will break down the purpose of each association and what each are doing.
- Action Pack - Action Pack does a lot. It handles routing, controllers, and a lot of the security inside of Rails. This blog will give a good overview of the interesting things that Action Pack is doing.
A Big <h1> Thank You to Sean Griffin
Sean gave me a lot of insight and motivation while writing my talk, and he continues to help me learn. Thank you Sean!