Changes

INBIA (view source)

Revision as of 10:31, 3 April 2019

1,809 bytes added , 10:31, 3 April 2019

no edit summary

}}

The [https://inbia.org/ International Business Innovation Association (INBIA)] has a [http://exchange.inbia.org/network/findacompany directory] containing information on 415 incubators in the United States.

===INBIA===

We retrieved the INBIA data as follows:

#Go to http://exchange.inbia.org/network/findacompany/ and search US

#Change to 100 results per page

#Save HTML page of 0-100

#Choose next page, Save HTML page of 100-200

#Sort Z-A

#Save HTML page 418-318

#Choose next page, Save HTML page of 318-218

#Note that we are missing some that start with L and M

#Search US L, Choose page with L as first letter, Save HTML of L

#Search US M, Choose page with M as first letter, Save HTML of M

Then process each of those html files with regular expressions in textpad

*Search .*biobubblekey Replace #

*Search ^[^#].*\n Replace NOTHING

*Search .*href=\" Replace NOTHING

*Search <\/a> Replace NOTHING

*Search \"> Replace \t

Then combine files, throw out duplicates, move columns, sort. This results in a file without headers where the lines are like:

1863 Ventures/Project 500 /?c=companyprofile&UserKey=4794e0a6-3f61-4357-a1cb-513baf00957e

4th Sector Innovations /?c=companyprofile&UserKey=cc47b04e-1c2a-4019-88b3-05d1163a0d6a

712 Innovations /?c=companyprofile&UserKey=531ad600-e11a-4c74-9f37-bace816b9325

AccelerateHER /?c=companyprofile&UserKey=3c05d1c1-91b5-48ae-8ec3-c77765b10c2b

ACTION Innovation Network /?c=companyprofile&UserKey=5ac08dd0-364d-47b2-8de0-a7536a3b4802

We can now build a crawler to call http://exchange.inbia.org/network/findacompany/ with then the URL extension (either encoded or with <nowiki>&</nowiki> replaced with just &), for example: http://exchange.inbia.org/network/findacompany/?c=companyprofile&UserKey=da2dbe35-9afa-4141-9b31-4e2cfd46a5aa Gets the company page for Cambridge Innovation Center.

We can then rip out the contact information, including URL, and the people, using either beautiful soup or regular expressions.

AnneFreeman

83

edits

Changes

INBIA (view source)

Revision as of 10:31, 3 April 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools