Instantly Turn Web Pages into Data with Import.io
Want to turn the web into data? My name is Alex Gimson and I’m the Community Evangelist at import.io. I’m going to explain exactly what we’re doing at import.io, why we’re doing it, and show you some helpful links so you can do it!
So What is Import.io?
Import.io is a cloud-based platform that makes it easy for you to pull data from a website and have this data updated in real time. In other words, we give you programmatic access to any website. We do this by providing you with a number of easy to use tools (all of which we’ll look at in this course). Everything is point-and-click, meaning all you need to do is map the data on a page and our algorithms will do the rest for you. When you’re done, you’ll end up with an API (an interface) that gives you real time access to the data on that website - all without you having to code anything!
For the more tech savvy among you, we’ve got some pretty powerful data extraction features including the ability to make custom crawlers, get data from behind a login, and combine multiple data sources and query them all at once with a single API call. And if you’re not technical, don’t worry! By the end of this course you’ll be a certified data extraction expert.
Pretty much everything you’ll do with import.io is in the cloud, meaning you can access your data anywhere in the world - so long as there’s an internet connection. We also give you a whole range of download and integration options such as Excel, CSV, JSON or HTML not to mention a whole range of client libraries.
Why did we build all of this? Because our mission is to structure the web.
So why structure the web? Structuring data gives you ultimate control and flexibility. You can analyse the data and make business decisions based on hard data.
Real Life Examples
Imagine you are a clothing company who is competing with another online brand. Part of your competitive strategy is looking up, and altering your prices according to their prices. Instead of setting up someone with a laptop to go through the website and collect the prices of every product on the site, you can simply run a crawler and, in a matter of hours, you will have prices for every product that you need. Even better, you’ll have this information in a spreadsheet that can be analysed, visualised (Tableau is great for this!) and updated in real time – saving you lots of money, time and someone’s sanity!
Structuring the web also allows us to visualise cool stories, such as the one we will be looking at today; do sentiment analysis on reviews and music lyrics (my personal fav) AND FINALLY, it’s a lot easier than copy and paste or writing your own web scraper!
From web page, to data table, to viz with import.io
Our Four Main Tools
- Firstly our most popular tool is our magic tool. This needs no interaction and can be found on our website - import.io. By simply putting a URL into the box, our algorithms will convert this webpage into a table allowing you to manipulate and use the data.
- Next up is our most important tool - the extractor. This is used when our data sits in a single page such as a football table.
- Our next tool is a crawler. This tool is used when your data sits in more than 5 webpages - a good example of this is if you are wanting to get information for all of the clothing products on a website. After training the tool on 5 pages, it will automatically go through the rest of the website and get all of the data based on this training.
- Finally, our last tool is our connector tool. This uses page interactions such as searches and page clicks to display the data, you can then query the API so that different data is displayed and the tool will extract that.
Interested? Get More Info Here
Some essential watching are my getting started and advanced extraction techniques webinars, these offer a hands on approach to data extraction and are a bit more interactive than documentation!
That was a quick introduction to import.io and I really hope it gives you the tools you need to succeed!
Alex Gimson - Community Evangelist at import.io
Twitter: @Alex Gimson
Related Stories
Subscribe to our blog
Get the latest Tableau updates in your inbox.