Email: kellypaulcatalena21614@gmail.com
Phone: 979-218-3549
USDA ARMS Dashboard
Project Description:
Unfortunately I never had the chance to do an internship while I was earning my degree due to stacking up classes every summer and the need to maintain stability for my wife and kiddos. This has made me nervous about starting a new career. I wanted to gain insight into the process from start to finish when something, like a new dashboard, was needed. Luckily, my wife works for a very large, global company and has experience with the partnership between business stakeholders and technical professionals to get things done. She agreed to help me better understand this partnership by acting as my business stakeholder and coming to me with a problem that I needed to solve.
​
Skills Used/Learned:
-
Technical Skills: Basic cloud integration (Microsoft SQL server, Key Vault, Azure Data Factory), SQL (stored procedures, database management), Tableau, API use, Excel
-
Soft Skills: Business requirement gather, stakeholder engagement, communication, perseverance, technical troubleshooting
The Problem:
The business has to manually download data from USDA Quick Stats, transform and merge the data in excel, and then create graphs. This is an annual activity that generally takes several weeks to do (assuming everything goes according to plan). There are times when adhoc, random requests are needed throughout the year.
The Question:
Is there a way to better automate this entire process, so that the graphs are automatically created?
My Approach:
When I heard about the problem I knew I needed to figure out three things:
-
How to automate the retrieval of data
-
How to transform and store the data
-
How to graph the data
When I think about government data that is readily available via something like Quick Stats, I assume there is probably an API in existence that houses the same data. Based on a quick search, I found that the USDA ARMS API houses the same data as the USDA Quick Stats tool. I knew that using the API would allow me to automate the retrieval of data. Now that I knew I could use an API to grab the data, I needed to figure out a way to transform and store the data in a more usable format. I decided that I wanted to make this data easily accessible for the business anytime they need it for adhoc requests, so I wanted to store the data in a SQL database. I could use basic stored procedures to process the JSON response from the API and insert it into tables. Lastly, I could connect the SQL server to a Tableau dashboard, making the graphs "live" or always up-to-date with the most recent data. I had a plan for each of the three things listed above, but I had a big problem.. I had never used an API to get data before and had never automated the full process from start to finish like this before. Considering I had never worked with an API or automated a process like this from start to finish I had a lot of research and learning to do. After a ton of research, I figured out I could easily use Microsoft Azure to pull the data from the API, transform it, and load it into a SQL database (ETL process).
​
The Architectural Design:
​
After a ton of research, I decided to use Azure Data Factory to automate the API calls. The API was limited to 10,000 returned records, so I had to figure out a way to break down the API calls in a way that would return less than 10,000 records. I essentially created lists for each of the parameters required for the API GET requests and used ADF to loop through those lists, dynamically, creating the API calls, making the GET requests, and then using a stored procedure to load the results into tables I had already created in my database. This was a bit easier since the JSON objects returned by the API are the same no matter what combination of parameters are used.
​
The Dashboard:
Since my wife just made up this problem, there were no well-defined details on the kind of dashboards that would prove useful, so after looking at the data some, I created a dashboard with a few graphs that I thought were interesting. With an actual business stakeholder, with a legitimate problem, and a decent idea of what they would like to see, it would be easier to develop useful, meaning graphs and dashboards.
Conclusions:
Overall, this was an extremely rewarding project. I learned some extremely valuable lessons and a few new technical skills.
-
I learned that the relationship between the technical team and the business is extremely important. Without a strong working relationship, projects like this would take way longer than necessary with a final product that may not prove useful. It takes several iterations of business requirements gathering to fully understand the needs of the customer, the pain points, and the final product they would like to see. It is also important to make sure you are communicating what is and is not feasible, technically.
-
I learned that things never go according to plan. After a ton of research and building a great plan to accomplish all of this, I realized actual execution of the plan is extremely tough. Every time I thought I had figured out one issue, I hit another. For example, when I realized I could not do a few broad API calls to grab the data, I thought splitting up the API calls would be a breeze. When I tried running my ADF pipelines, I realized that the API would lock my API key if I hit the API too many times in a certain time period. It took several tries (and several API keys) to finally figure out how to grab all of the data I needed.
-
I learned technology is never straight-forward and costs a ton of money. I did not know anything about cloud costs before doing this project from my free account. The original amount allocated on a free account was depleted pretty quickly the more and more I did in Azure. I now understand you have to be very thoughtful in how you design and use resources in the cloud to conserve money. Also, I thought I had everything figured out. I would create a database in Azure, use the connection string in Tableau and be good to go. I was wrong. To use a SQL server connection string in Tableau you have to download the driver. To download the driver your Tableau account has to be connected to an enterprise license or you have to pay a ton of money for more access. This is when I discovered Tableau Public, which is great, but does not allow you to connect via connection string. This meant I had to download .csv files from my database and use those in Tableau.
-
I learned that adapting quickly is extremely important. Every-time I hit a wall, I had to find a way around that wall. I could not give up and I could not spend days and days trying to do the same thing, 10 different ways expecting a different result. I had to accept that my original plan was not going to work and adjust in real-time.
-
The guidance from the business stakeholder is invaluable. My wife did a great job pretending to be my business customer, but since she was only making up a problem, it was hard for her to clearly define what she needed. The input from the customer is necessary in order to provide a valuable final product.
-
It is important to understand the value of the MVP (minimum viable product). When I was creating the dashboard, I was trying to immediately build a dashboard with multiple graphs, filters, and embedded views. I quickly learned that jumping straight to the biggest, baddest thing is not always the best. The creation of an MVP gives you a base to grow a more powerful, useful final product from. The business customer can provide valuable feedback on the MVP, giving you insight into what needs to be changed, improved, added to meet the needs of the customer.