Towards multi-modal AI systems with open-world cognition