Matrix Factorization with Theano

Matrix factorization algorithms factorize a matrix D into two matrices P and Q, such that D ≈ PQ. By limiting the dimensionality of P and Q, PQ provides a low-rank approximation of D. While singular value decomposition (SVD) can also be used for this same task, the matrix factorization algorithms considered in this post accommodate missing data in matrix D, unlike SVD.

For an overview of matrix factorization, I recommend Albert Au Yeung’s tutorial. That post describes matrix factorization, motivates the problem with a ratings prediction task, derives the gradients used by stochastic gradient descent, and implements the algorithm in Python.

For exploratory work, it would be great if the algorithm could be implemented in such a way that the gradients could be automatically derived. With such an approach, gradients would not have to be re-derived when e.g., a change is made to the loss function (either the error term and/or the regularization term). In general, automatically derived gradients for machine learning problems facilitate increased exploration of ideas by removing a time-consuming step.

Theano is a Python library that allows users to specify their problem symbolically using NumPy-based syntax. The expressions are compiled to run efficiently on actual data. Theano’s webpage provides documentation and a tutorial.

The following code includes a Theano-based implementation of matrix factorization using batch gradient descent. The parameters are similar to those in the quuxlabs matrix factorization implementation. D is a second-order masked numpy.ndarray (e.g., a ratings matrix, where the mask indicates missing ratings), and P and Q are the initial matrix factors. The elements of P and Q are the parameters of the model, which are initialized by the function’s caller. The rank of the factorization is specified by the dimensions of P and Q. For a rank-k factorization, P must be m \times k and Q must be k \times n (where D is an m \times n matrix). Additional parameters specify the number of iterations, the learning rate, and the regularization importance.

The code doesn’t contain any derived gradients. It specifies the loss function and the parameters that the loss function will be minimized with respect to. Theano figures out the rest!

Continue reading

Tagged , , , | Leave a comment

Article Highlighter

Auto Highlight is a Chrome extension that automatically highlights the important content on article pages.

Here’s a link to the extension:
https://chrome.google.com/webstore/detail/highlight/dnkdpcbijfnmekbkchfjapfneigjomhh

The source code is on GitHub:
https://github.com/dstein64/highlight

After installing the extension, a highlighter icon appears in the location bar. Clicking that icon highlights important content on the page.

1

Continue reading

Tagged , , | Leave a comment

Anchor Graph Hashing in Python

I was recently collaborating on a project that relied on hashing (where I’m referring to “hashing” the same way it’s used in locality sensitive hashing, as opposed to its more conventional usage). One of my contributions was an implementation of Anchor Graph Hashing (AGH) [1] in Python. The code was integrated into the project, but I’ve uploaded the AGH module to its own separate GitHub repository.

https://github.com/dstein64/PyAnchorGraphHasher

[1] Liu, Wei, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. “Hashing with Graphs.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), edited by Lise Getoor and Tobias Scheffer, 1–8. ICML ’11. New York, NY, USA: ACM.

Tagged , , | Leave a comment

Color Coded Bash Prompt

This post explains what I’m referring to by color coded bash prompt, why it is useful to have one, and presents an implementation.

By color coded bash prompt, I am referring to the use of colors to represent aspects of the environment running bash. An ordinary bash prompt may display a username and hostname. Color coding can be used to represent whether the user is root, whether the session is running on a local or remote machine, and other aspects of the environment (not covered in this post).

When running multiple terminals at the same time, with some connected to remote machines and/or running as root, color coding makes it easier to keep track of the sessions, and may help prevent inadvertently entering a command as root or on the wrong machine.

I have configured my bash prompt’s colors to be a function of whether I am running as a root or non-root user, and whether I am connected to a local or remote machine. The username is displayed in green for a non-root user and in red for the root user. The hostname is displayed in blue for a local bash session and in cyan for a remote session.

The following image shows the four possible scenarios, 1) on a local machine as a non-root user, 2) on a local machine as root, 3) on a remote machine as a non-root user, and 4) on a remote machine as root.

bash

Continue reading

Tagged , , , | Leave a comment

recrun Chrome Extension

recrun is a Chrome extension I developed to provide a clean interface for reading articles on the web. recrun is an acronym that stands for retain essential content, remove unwanted noise. It uses the Diffbot Article API to extract relevant content from article pages (I work at Diffbot 😀).

The extension can be downloaded at the Chrome Web Store and the source code is available at Github.

A Diffbot token is required to use the extension. A free token can be obtained by signing up at https://www.diffbot.com/plans/free.

Usage

  1. Install the extension
  2. Sign up for a Diffbot API token
  3. Navigate to an article web page
  4. Click the eyeglasses icon in the Chrome toolbar (see arrow in image below)

screenshot

Tagged , | Leave a comment

Bar Charts for Hacker News Polls

Update 10/14/2013: The tool is now available as an extension on the Chrome Web Store.

I was recently viewing a poll on Hacker News and thought it would be useful to visualize the results of the poll, so I wrote a script for generating bar charts. The image below shows an example.

Bar Chart Example

Continue reading

Tagged , , | Leave a comment

Assorted Links

  1. Sebastian Thrun on Charlie Rose, April 25, 2012 (click image to play video)
  2. 10 Things Your Commencement Speaker Won’t Tell You – here’s an idea from the article: “Read obituaries. They are just like biographies, only shorter. They remind us that interesting, successful people rarely lead orderly, linear lives.” Here’s the New York Times Obituaries page: NY Times Obituaries.
Leave a comment

Changing Mac Key Bindings

This post explains the steps I took to make my Mac keyboard work more like a PC keyboard. The goal here is not necessarily for keys with the same names to be in the same position (e.g., the Ctrl key), but rather to have the same functionality across platforms when pressing keys located in the same positions.

Until last summer, I mainly used Windows and various distributions of Linux. Last summer I got my first laptop, a MacBook Air. I still use Windows on my desktop computer, and I have Xubuntu installed on VirtualBox virtual machines on both my desktop and laptop.

Both Windows and Linux (i.e., the distributions that I have used) have similar key bindings. For example, Ctrl-C is used for copying on Windows and it is also used for copying on every desktop environment I have used on Linux. When I started using OS X, I quickly realized that it uses a different set of key bindings. For example, the key binding for copying is Command-C, and the Command key on a Mac keyboard is located in a different position than the Ctrl key is on a PC keyboard.

Continue reading

Tagged | 3 Comments

Project Euler Common Lisp Helper Functions

Project Euler is a site that has math problems that can be solved with the assistance of a computer program (solving the problems without programming would take an unreasonable amount of time). The problems are fun and they are a good way to learn a programming language and some interesting math.

A few years ago I solved the first 77 problems using Common Lisp.

I intend to eventually continue solving more, maybe switching from Lisp to Matlab, Mathematica, or some other language.

The site requests that users do not share solutions, which I have no intention of doing. However, I did accumulate some helper functions that I wrote while solving the problems, and I think they could be helpful for people getting started on Project Euler using Common Lisp.

Zipped: https://github.com/dstein64/euler-lisp-helpers/zipball/master
Repository: https://github.com/dstein64/euler-lisp-helpers

Also, check out Solving Project Euler Problems for some additional ideas.

Tagged , | Leave a comment

OS X Finder Shortcuts

Update 9/29/2018: The scripts no longer work as-is on macOS 10.14 Mojave. If you’ve already installed them, you will receive the error, Not authorized to send Apple events to Finder. (-1743). A workaround is to open each app in Script Editor, copy and paste the code to a new script, and save new apps (doing this will not retain the icons, which will have to be transferred to the new apps). After doing this, launching the new apps will prompt you for permission to run. If you haven’t already installed the apps, you will receive a “Program is damaged…” message when trying to load the application in Script Editor. The workaround steps from the 1/27/2014 update below are no longer applicable, since the Anywhere option is no longer available in the security settings. This can be altered by running sudo spctl --master-disable to temporarily disable Gatekeeper before initially launching or editing the apps, and then sudo spctl --master-enable afterwards to turn Gatekeeper back on. However, even after doing this you’ll still have to use the workaround for the -1743 error described above. It would probably be easier to create the apps from scratch using the source code and icons below (be sure to export the scripts as apps using Script Editor).

Update 1/27/2014: Depending on your version of OS X, you may have to hold down the command key before and during dragging the icons to the Finder toolbar. Also, on OS X version 10.8 and newer, opening the programs may not work, instead displaying a message that “Program is damaged and can’t be opened. You should move it to the Trash.” To prevent this from happening, open Gatekeeper settings located in System Preferences > Security & Privacy. Make a note of the current setting for Allow apps downloaded from:, and then change it to Anywhere and confirm by pressing Allow From Anywhere. Once each of the programs on your toolbar has been successfully launched, they no longer go through Gatekeeper. You can restore Gatekeeper settings to your original option noted earlier.

Update 3/4/2012: The zip file now includes the updated version of the muCommander launcher (see comments).

I have uploaded a bunch of AppleScript scripts that I have found useful in OS X’s Finder. They are described below. They were saved with the .app extension. I believe that I did this so that the icons would appear correctly in the Finder toolbar.

I recently started using OS X after years of using Windows and Ubuntu. Over the summer, I purchased the MacBook Air. I had not used OS X very much in the past, but I liked the idea of having a sleek user interface on top of a Unix-based OS, which would have tools that I’m familiar with from Linux (or if not, would be available). Additionally, two software packages that prevented me from switching from Windows to Linux in the past were available for OS X, Adobe CS and Microsoft Office. Also, there was no PC laptop that was as thin and fast as the MacBook Air. I believe this might have changed with the new Ultrabooks that have been released recently.

Anyhow, after a few months of using OS X, I have accumulated a few Finder scripts that I have found useful. I downloaded some of these that were already available (and possibly modified them) and wrote the ones that I couldn’t find. The image below shows the shortcuts to these scripts in my Finder toolbar.


Continue reading

Tagged , , | 3 Comments