project / September 7th, 2021
The Reddit Careers Bot is a custom automation project that uses Selenium Webdriver to open the Calgary subreddit page from Reddit on the Google Chrome browser, sign into my Reddit account, click the thread titled "Weekly Career/Employment Advice & Discussion Thread" that is usually stickied at the top of the subreddit, and scrapes all the posts/comments in this thread. The user and text from each post is taken and listed in an email that is sent to myself, which keeps me informed of all employment opportunities or career advice that is shared over the course of the week.
This is a professional portfolio project that was created after completing the 100 Days of Code Python Bootcamp.
After completing the 100 Days of Code Python Bootcamp, all of the projects I had done for my professional portfolio were either web or desktop applications. With this project, I wanted to expand the variety of projects in my portfolio by applying what I learned about using Selenium for automation. I did not have any previous experience using Selenium for a portfolio project, and I thought this was a good chance to automate an aspect of my life. Since the start of the pandemic, job searching has become routine in my everyday life, and the careers thread on the Calgary subreddit is one of many resources I monitor on a daily basis. The Reddit Careers Bot will make my job search a bit easier, as it is one less resource I have to check each day since all the posts in the careers thread during the course of the week will be delivered to me on a weekly basis.
Python
Selenium
smtplib
At the beginning of this project, I wrote down a general list of steps I wanted my Python script to achieve:
Open the Reddit URL for the Calgary subreddit in Google Chrome with Selenium Webdriver
Sign into my Reddit account
Click the weekly careers advice and discussion thread that is stickied at the top of the Calgary subreddit
If a careers thread is found, grab all the posts in the thread and create two Python arrays: one for text inside each post, the other for the username of each post.
Take the post text and username inside each array and append it to a string that will be the email message
Send email to my Gmail account
If no careers thread is found, send an email to indicate that no thread was found this week
To start, Selenium is imported into the Python script. Then inside the script, I create a class called RedditBot where a driver object is created from the Selenium Webdriver module. When the script is run, this driver can be used to open a URL in the Google Chrome browser and perform other commands like clicking pages and selecting HTML elements for web scraping autonomously. The RedditBot class will also have three attributes that will be empty to start: "posts" which will contain an array of texts from each post inside the careers thread, "users" which will contain an array of usernames from each post, and "message" which will be a string that combines information from both "posts" and "users" to be sent in an email.
RedditBot will have one method, called "get_posts". Inside "get_posts", the driver will be used to open the Calgary subreddit URL in Google Chrome.
After a delay of 5 seconds that allows the page to completely load, the username and password fields on the right side of the page are selected by the driver and filled with my Reddit username and password (which are stored as environmental variables in my project folder). The login button is selected and clicked by the driver to sign into my Reddit account.
Once signed in, the driver will try to find the HTML hyperlink element with the name "Weekly Career/Employment Advice & Discussion Thread". If found, the driver will click the link to access the thread.
In the thread page, the driver will find the text content for each post through the use of a CSS selector and assign it to the RedditBot class' "posts" attribute as an array of strings. Subsequently, the same thing is done for the RedditBot class' "users" attribute which will contain an array of username strings.
In the comments section of the thread shown in the image above, there are two comments. The top comment is unhidden, which means it has sufficient upvotes (also known as likes) from fellow users and is shown normally. The bottom comment is hidden, which means it has been downvoted (disliked) enough by users that it is not initially shown. When the driver is using CSS selectors to grab the post text and username, the driver will select HTML elements with the class "noncollapsed", which means the email will only contain quality comments that haven't been downvoted and hidden by readers.
Once the "posts" and "users" arrays have been acquired, the "message" string is appended with a message saying "Here are the posts this week from the weekly career/employment advice & discussion thread in /r/Calgary:". Then, a FOR loop is used to loop through the "posts" array. For each post text, the name of the user that submitted the post will also be acquired from the "users" array and the "message" string is appended with a new string that contains a line break, the new post text, and and new user. After the "posts" array is fully looped though, the "send_email" function is called from the send_email.py file. This is a separate Python script that will use the smptlib library to send an email. The "send_email" function will receive the "message" string as a function and prepare an email with "message" as the body. A email subject, sender, and recpient are specified, with sender and recipient being my own Gmail account and a connection is secured to the Gmail server to encrypt the email message. After the program signs into my Gmail account using my email address and password, the message is composed and sent to my inbox:
After the email is sent, the driver will quit and the Google Chrome browser window will close. I can check my Gmail inbox to see the message sent, as seen in the image above.
If no careers thread is found, anexception (NoSuchException) is raised and the email will instead say the following:
This was an interesting project that provided a small glimpse into the power of Selenium and how it can be used to web scrape data and automate repetitive tasks. It was especially fascinating to see how a browser can autonomously open and perform commands I wrote in a programming script. The final product was highly visual! It was really exciting watching the driver work and browse on Google Chrome on its own.
To run the script and automatically send an email once a week, the program files can be loaded onto PythonAnywhere and be set to run on the day before the weekly careers thread is closed and a new thread is created. Over time, adjustments will have to be made to the script. For example, the name of the careers thread can change or be created biweekly instead of a weekly basis. The script should be maintained and updated accordingly.