Providing Technology Training and Mentoring For Modern Technology Adoption
During this project, you’ll work with Big Data techniques, discuss many aspects of data analytics and have as a resource a big data guru to seek guidance from. The expectation is you and your team will lead a final presentation of your findings on day 5.
Nashville is the capital and most populous city in Tennessee with 691,243 citizens in 2017.
Austin is the state capital of Texas and had a population of 950,715 in 2017.
Both cities are known for their live-music scene, outdoor activities, tourism and VC’s. Ultimately your team’s goal is to show why one city is a better place to be a VC.
During this project, you’ll work with Big Data techniques to help in the decision process, discuss many aspects of data analytics and have as a resource a big data guru to seek guidance from. The expectation is you and your team will lead a final presentation of your findings on day 5.
How do you illustrate that one place is better than another? That is entirely up to your team of course. For example you could use statistical information to show growth… https://www.statista.com/topics/1618/residential-housing-in-the-us/, https://www.bls.gov/, and https://www.crunchbase.com/.For Austin, you might want to utilize data from places like https://www.naxtrack.com/blog/austin-venture-capital/, https://data.austintexas.gov/ and http://austintexas.gov/page/open-data-reports.
For Nashville you could see http://www.venturenashville.com/, and https://data.nashville.gov/
Scala The goal on day 1 will be to immerse students into the hackathon like environment while beginning their exploration of the Scala programming language. Our project on Day 1 will be to leverage the java-based JSoup framework to scrape individual websites. Our baseline challenge will be: Given an arbitrary website, extract the text contents and:
In order to enable student work, the Day 1 lectures will cover:
Hadoop Ecosystem On day 2, students will leverage basic Hadoop system tools to store scraped content and perform aggregate analyses. In particular, they will join Crunchbase information about VC funding rounds to website contents to identify features differentiating VCs.
Spark On Day 3, students will continue to expand their analysis of features using Spark and Spark SQL. They will also leverage Spark Streaming to generate aggregate information about the scraping process and its progress. Lecture contents will include:
Cassandra and HBase On Day 4, students will begin the process of merging their team’s information into an overall Cassandra database. They will further compete on analytics based upon features introduced by various teams. The day’s lecture contents will include:
Kafka and Spark Streaming On Day 5, students will integrate the various components into a full application using Kafka. The formal contents will be kept short in order to let teams explore topics of interest to them. The lecture portion of the class will cover: