Using R for Homeless Services Data Analysis
In my last post, I described the thought process behind moving our custom reporting out of our HMIS software. Since that posting, we have only become more convinced that, while it was a difficult decision, it is the best decision for our CoC's needs.
The easy decision (for me) was what to switch to. For a number of years I had a software crush on R and felt like it was something I'd never actually know or speak to on a daily basis. I liked that R was free and open source and that its community makes a giant effort to encourage leadership in women and to make room for ALL people to access the software and the training needed. See their R Community Diversion and Inclusion Working Group here to check the excellent work happening there. These values align with mine and with the mission of the work that I do. That alone would convince me, but from a purely practical standpoint, it is also super powerful and smart and fast. It is full of possibilities: R can transform and process data very quickly, and R Shiny and R Markdown can help you present your data in literally whatever different ways you can dream up.
My only aim with this particular blog post is to give ways to get started just enough to see if you are possibly interested in exploring it more. I will walk through a few decision points to help shape your thinking process.
Exporting from HMIS
Depending on your aims, you will want to think about the reality of getting the data out of your HMIS. Will you need it daily? Weekly? Less than that? We definitely need it daily. As to how to get the data out, HUD requires that all HMIS vendors be able to produce a CSV export. It is divine and working with it in R is super fast and easy. What HUD does not mandate is how long that CSV export needs to take to run. :/ Your vendor may be able to produce it quickly, but mine does not. Like it takes a LONG time. We may have to use XML, but I am working on a few ideas. Either way, this piece is very important to the success of your project.
Well this is the best part! Because once you have your data loaded in R, you can use what you know in regards to data analysis and apply that to transforming your data with R. R definitely has an intimidation factor. It doesn't help that most R users seem to be scientists, like actual scientists, not "data scientists". But hear me out! If you have ever written or modified a report in ART or if you have any kind of data analysis experience and you are thinking you can't use a command line to manipulate data, you should REALLY just try it first. Tell me if you can understand the following (I will tell you ahead of time that | means "or" and "mdy" means month-day-year and is.na means "is null"):
filter(providers, mdy(Operating_Start_Date) < today() & (is.na(Operating_End_Date) | mdy(Operating_End_Date) > today()) )
This basically gives us all active providers as of today. Yes it's different from ART and SQL but it is still the same idea. In your head, you would say something more natural like, "I want to only see the providers that have an Operating Start Date that's in the past, and either no End Date or the End Date's in the future." When you did this in ART or in any other system, you coded it with different symbols, but the meaning was the same. It will just taking some learning, practice, and time to pick up your R ways.
Ways to Learn R
I'm just going to list some ways here:
- Read the book "R for Data Science" and start to work through it. That is the free online version but I recommend buying it if you can because it is good to support the authors. Also learning computer things from actual books is the best!
- Find a local R-Ladies group and go! This has been super helpful for me. The first one I went to was about using GitHub as a versioning tool for R. I still rely on this daily.
- Find a mentor. I posted a cry for help on a slack channel I'm on and got a response from a guy who's been coming to me every Friday morning and helping me through difficult spots. This has been super helpful to me because I can save my questions for him and move on without worrying about getting stuck forever. Being in homeless services makes it easier to get this kind of help.
- Find tutorials on YouTube when you need them. Some of the instructional videos on YouTube for R are very good.
- Meet others who are using open source software in homeless services. I found some people on GitHub who are in homeless services and a couple who are using R as well. Just make yourself known to them so you can send them the odd question now and again and maybe you will be of help to them one day.
- Ask questions.
- Read this blog. :)
- Start your own blog. :) :)
Presenting Your Data
Well, that's another blog post, really. But I wanted to bring it up here because the possibilities here are really endless, which means some planning is in order. You could decide you want to use R for data transformation, but Tableau or Qlik Sense Cloud for your data presentation. This would allow you to transform your data into more simple and aggregated formats in R before uploading it all to your data visualization tool, where you could create all your sexy visualizations. Many of the newer reporting tools were really built with VISUALIZATIONS in mind. And they know this. For instance, Tableau can speak to R. If you build an object in R, you can somehow connect that to Tableau. I don't know the particulars, but if this idea speaks to you, it may be something you want to check into.
Another way to present your data is to use R Shiny. It is free and open source (unlike Tableau or Qlik) and is part of the R universe. This is the way we are going. I have even already created a thing- a shiny thing!- for our CoC to check Diversion Data Quality. It uses an export I created in ServicePoint, some code I wrote in R, and a Shiny App I created and posted to shinyapps.io. It is not the best most amazing thing ever created, but it is not ART and it is fast and accurate. If you're interested to see it, let me know.
Check with your People
When I was thinking through all of this, I felt pretty lonely about it. Your people don't want you to feel lonely! If you're really considering a move like this, talk about it with whoever will listen. See where their hesitations are, see what questions come up. If you can't answer them, then that gives you some more to learn, more to think about. I became pretty dang convinced about all of this before I had the chance to run the whole thing by everyone in a more formal way. I created a presentation and everything. I was one with the possibility that this would all get put on hold or completely shot down but it did not. I have faced some new surprises and challenges and I just keep talking about them with my people, always keeping them in the loop, even when I don't know what to do.
This is a long one, I know. But to sum it all up, if you are already to the point where you have decided to move your custom reporting outside of HMIS, your next decision points are how/when you will be exporting your data from your HMIS, whether R seems like the right fit for your organization, then if so, how you are going to learn R, and how you will present your data. Talk to your people to help guide you through these decision points. Don't forget other HMIS data analysts are also "your people" and we can support each other. :)